自学内容网 自学内容网

04 简单神经网络推导及实现(C++/C)

简单神经网络推导及实现(C++/C)

代码见A simple neural network - stock price prediction as an example(C++/C)

1. 主要模块

  • Neuron.h:用于声明神经元类和构造函数。
  • Layer.h:用于声明网络层类和构造函数。
  • NNet.h:用于声明神经网络类以及构造和初始化函数。
  • Dataset.h:用于声明数据集类和构造函数。
  • Trainer.h:用于声明神经网络的前向和后向传播类及相关函数。

2. 理解神经网络结构和参数

神经网络结构图
上图显示了一个具有2维(2-D)输入和1维(1-D)输出的3层简单神经网络。w和bias是网络的学习参数。具体计算过程如下:

前向传播

(Input Layer - Hidden Layer)

input

A 1 = X = [ x 1 , x 2 ] A_1=X=[x_1,x_2] A1=X=[x1,x2]

Weights

W 1 = [ w 11 ( 1 ) w 12 ( 1 ) w 21 ( 1 ) w 22 ( 1 ) w 31 ( 1 ) w 32 ( 1 ) ] W_1 = \begin{bmatrix} w_{11}^{(1)} & w_{12}^{(1)} \\ w_{21}^{(1)} & w_{22}^{(1)} \\ w_{31}^{(1)} & w_{32}^{(1)} \end{bmatrix} W1= w11(1)w21(1)w31(1)w12(1)w22(1)w32(1)

Biases

B 2 = [ b 1 ( 2 ) b 2 ( 2 ) b 3 ( 2 ) ] B_2 = \begin{bmatrix} b_1^{(2)} \\ b_2^{(2)} \\ b_3^{(2)} \end{bmatrix} B2= b1(2)b2(2)b3(2)

Weighted Sum

Z 2 = W 1 ⋅ A 1 T + B 2 = [ w 11 ( 1 ) a 1 ( 1 ) + w 12 ( 1 ) a 2 ( 1 ) + b 1 ( 2 ) w 21 ( 1 ) a 1 ( 1 ) + w 22 ( 1 ) a 2 ( 1 ) + b 2 ( 2 ) w 31 ( 1 ) a 1 ( 1 ) + w 32 ( 1 ) a 2 ( 1 ) + b 3 ( 2 ) ] Z_2 = W_1 \cdot A_1^T + B_2 = \begin{bmatrix} w_{11}^{(1)}a_1^{(1)} + w_{12}^{(1)}a_2^{(1)} + b_1^{(2)} \\ w_{21}^{(1)}a_1^{(1)} + w_{22}^{(1)}a_2^{(1)} + b_2^{(2)} \\ w_{31}^{(1)}a_1^{(1)} + w_{32}^{(1)}a_2^{(1)} + b_3^{(2)} \end{bmatrix} Z2=W1A1T+B2= w11(1)a1(1)+w12(1)a2(1)+b1(2)w21(1)a1(1)+w22(1)a2(1)+b2(2)w31(1)a1(1)+w32(1)a2(1)+b3(2)

Activate (ReLU or Sigmoid)

A 2 = σ ( Z 2 ) = [ σ ( z 1 ) σ ( z 2 ) σ ( z 3 ) ] = [ a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] A_2 = \sigma(Z_2) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \\ \sigma(z_3) \end{bmatrix} = \begin{bmatrix} a_1^{(2)} \\ a_2^{(2)} \\ a_3^{(2)} \end{bmatrix} A2=σ(Z2)= σ(z1)σ(z2)σ(z3) = a1(2)a2(2)a3(2)

(Hidden layer - Output layer)

Weights

W 2 = [ w 11 ( 2 ) w 12 ( 2 ) w 13 ( 2 ) w 21 ( 2 ) w 22 ( 2 ) w 23 ( 2 ) ] W_2 = \begin{bmatrix} w_{11}^{(2)} & w_{12}^{(2)} & w_{13}^{(2)} \\ w_{21}^{(2)} & w_{22}^{(2)} & w_{23}^{(2)} \end{bmatrix} W2=[w11(2)w21(2)w12(2)w22(2)w13(2)w23(2)]

Biases

B 3 = [ b 1 ( 3 ) b 2 ( 3 ) ] B_3 = \begin{bmatrix} b_1^{(3)} \\ b_2^{(3)} \end{bmatrix} B3=[b1(3)b2(3)]

Weighted Sum

Z 3 = W 2 ⋅ A 2 + B 3 = [ w 11 ( 2 ) a 1 ( 2 ) + w 12 ( 2 ) a 2 ( 2 ) + w 13 ( 2 ) a 3 ( 2 ) + b 1 ( 3 ) w 21 ( 2 ) a 1 ( 2 ) + w 22 ( 2 ) a 2 ( 2 ) + w 23 ( 2 ) a 3 ( 2 ) + b 2 ( 3 ) ] Z_3 = W_2 \cdot A_2 + B_3 =\begin{bmatrix} w_{11}^{(2)}a_{1}^{(2)} + w_{12}^{(2)}a_{2}^{(2)} + w_{13}^{(2)}a_{3}^{(2)} + b_1^{(3)} \\ w_{21}^{(2)}a_{1}^{(2)} + w_{22}^{(2)}a_{2}^{(2)} + w_{23}^{(2)}a_{3}^{(2)} + b_2^{(3)} \end{bmatrix} Z3=W2A2+B3=[w11(2)a1(2)+w12(2)a2(2)+w13(2)a3(2)+b1(3)w21(2)a1(2)+w22(2)a2(2)+w23(2)a3(2)+b2(3)]

Activate (Sigmoid)

Y ^ = A 3 = σ ( Z 3 ) = [ y 1 ^ y 2 ^ ] \hat{Y} = A_3 =\sigma(Z_3) =\begin{bmatrix} \hat{y_1} \\ \hat{y_2} \end{bmatrix} Y^=A3=σ(Z3)=[y1^y2^]

注意: 回归问题中, A 3 = Z 3 A_3=Z_3 A3=Z3

计算损失(假设使用均方误差(MSE)损失函数或交叉熵(CE)损失函数)

M S E _ Loss = 1 2 m ∑ i = 1 m ( y i ^ − y i ) 2 {MSE}\_\text{Loss} = \frac{1}{2m} \sum_{i=1}^m (\hat{y_i} - y_i)^2 MSE_Loss=2m1i=1m(yi^yi)2

二分类:
C E _ Loss = − 1 m ∑ i = 1 m [ y i log ⁡ ( y ^ i ) + ( 1 − y i ) log ⁡ ( 1 − y ^ i ) ] {CE}\_\text{Loss}= -\frac{1}{m} \sum_{i=1}^m \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right] CE_Loss=m1i=1m[yilog(y^i)+(1yi)log(1y^i)]
多分类:
C E _ Loss = − 1 m ∑ i = 1 m y i log ⁡ ( y ^ i ) {CE}\_\text{Loss}= -\frac{1}{m} \sum_{i=1}^m y_i \log(\hat{y}_i) CE_Loss=m1i=1myilog(y^i)
后向传播

(Output Layer - Hidden Layer)

计算梯度并更新权重和偏差。其中, α \alpha α是学习率。

δ ( 3 ) = ∂ Loss ∂ Z 3 = ( Y ^ − Y ) = A 3 − Y = [ δ 1 ( 3 ) δ 2 ( 3 ) ] \delta^{(3)} = \frac{\partial \text{Loss}}{\partial Z_3} = (\hat{Y} - Y) = A_3 - Y = \begin{bmatrix} \delta_1^{(3)} \\ \delta_2^{(3)} \end{bmatrix} δ(3)=Z3Loss=(Y^Y)=A3Y=[δ1(3)δ2(3)]

W 2 = W 2 − α ⋅ ∂ Loss ∂ W 2 = W 2 − α ⋅ ∂ Loss ∂ Z 3 ⋅ ∂ Z 3 ∂ W 2 = W 2 − α ⋅ δ ( 3 ) ⋅ A 2 T = [ w 11 ( 2 ) − α δ 1 ( 3 ) a 1 ( 2 ) w 12 ( 2 ) − α δ 1 ( 3 ) a 2 ( 2 ) w 13 ( 2 ) − α δ 1 ( 3 ) a 3 ( 2 ) w 21 ( 2 ) − α δ 2 ( 3 ) a 1 ( 2 ) w 22 ( 2 ) − α δ 2 ( 3 ) a 2 ( 2 ) w 23 ( 2 ) − α δ 2 ( 3 ) a 3 ( 2 ) ] W_2 = W_2 - \alpha \cdot \frac{\partial \text{Loss}}{\partial W_2} = W2 - \alpha \cdot \frac{\partial \text{Loss}}{\partial Z_3} \cdot \frac{\partial Z_3}{\partial W_2} \\ = W2 - \alpha \cdot \delta^{(3)} \cdot \mathbf{A_2}^T = \begin{bmatrix} w_{11}^{(2)} - \alpha \delta_1^{(3)}a_1^{(2)} & w_{12}^{(2)} - \alpha \delta_1^{(3)}a_2^{(2)} & w_{13}^{(2)} - \alpha \delta_1^{(3)}a_3^{(2)} \\ w_{21}^{(2)} - \alpha \delta_2^{(3)}a_1^{(2)} & w_{22}^{(2)} - \alpha \delta_2^{(3)}a_2^{(2)} & w_{23}^{(2)} - \alpha \delta_2^{(3)}a_3^{(2)} \end{bmatrix} W2=W2αW2Loss=W2αZ3LossW2Z3=W2αδ(3)A2T=[w11(2)αδ1(3)a1(2)w21(2)αδ2(3)a1(2)w12(2)αδ1(3)a2(2)w22(2)αδ2(3)a2(2)w13(2)αδ1(3)a3(2)w23(2)αδ2(3)a3(2)]

B 3 = B 3 − α ⋅ ∂ Loss ∂ B 3 = B 3 − α ⋅ ∂ Loss ∂ Z 3 ⋅ ∂ Z 3 ∂ B 3 = B 3 − α ⋅ δ ( 3 ) = [ b 1 ( 3 ) − α δ 1 ( 3 ) b 2 ( 3 ) − α δ 2 ( 3 ) ] B_3 = B_3 - \alpha \cdot \frac{\partial \text{Loss}}{\partial B_3}=B_3 - \alpha \cdot \frac{\partial \text{Loss}}{\partial Z_3}\cdot \frac{\partial Z_3}{\partial B_3} = B3 - \alpha \cdot \delta^{(3)} = \begin{bmatrix} b_1^{(3)} - \alpha \delta_1^{(3)} \\ b_2^{(3)} - \alpha \delta_2^{(3)} \end{bmatrix} B3=B3αB3Loss=B3αZ3LossB3Z3=B3αδ(3)=[b1(3)αδ1(3)b2(3)αδ2(3)]

激活函数梯度
Sigmoid激活函数梯度:

σ ( x ) = 1 1 + e − x = ( 1 + e − x ) − 1 \sigma(x) = \frac{1}{1 + e^{-x}} = (1 + e^{-x})^{-1} σ(x)=1+ex1=(1+ex)1

σ ′ ( x ) = − ( 1 + e − x ) − 2 ⋅ ( − e − x ) = e − x ( 1 + e − x ) 2 = σ ( x ) ⊙ ( 1 − σ ( x ) ) \sigma'(x) = - (1 + e^{-x})^{-2} \cdot (-e^{-x}) = \frac{e^{-x}}{(1 + e^{-x})^2}=\sigma(x) \odot (1 - \sigma(x)) σ(x)=(1+ex)2(ex)=(1+ex)2ex=σ(x)(1σ(x))

Softmax激活函数梯度:
σ ( x i ) = e i ∑ j = 1 e x j = e i s u m \sigma(x_i) = \frac{e^i}{\sum_{j=1} e^{x_j}} = \frac{e^i}{sum} σ(xi)=j=1exjei=sumei
当 i=j 时:
σ ′ ( x j ) = s u m ∗ e i − e j ∗ e i ( s u m ) 2 = σ ( x i ) ( 1 − σ ( x j ) ) \sigma'(x_j) = \frac{sum * e^i - e^j * e^i}{(sum)^2}=\sigma(x_i)(1-\sigma(x_j)) σ(xj)=(sum)2sumeiejei=σ(xi)(1σ(xj))
当 i≠j 时:
σ ′ ( x j ) = s u m ∗ 0 − e j ∗ e i ( s u m ) 2 = − σ ( x i ) σ ( x j ) \sigma'(x_j) = \frac{sum * 0 - e^j * e^i}{(sum)^2} = - \sigma(x_i)\sigma(x_j) σ(xj)=(sum)2sum0ejei=σ(xi)σ(xj)

损失函数梯度
MSE损失函数梯度,通常不使用激活函数:

∂ L ∂ z = ∂ L ∂ y ^ i = y i ^ − y i \frac{\partial \text{L}}{\partial z} = \frac{\partial \text{L}}{\partial \hat{y}_i} = \hat{y_i} - y_i zL=y^iL=yi^yi

当使用sigmoid激活函数时,交叉熵损失函数梯度(log x=ln x):

∂ L ∂ y ^ i = − ( y i y ^ i − 1 − y i 1 − y ^ i ) \frac{\partial L}{\partial \hat{y}_i} = - \left( \frac{y_i}{\hat{y}_i} - \frac{1 - y_i}{1 - \hat{y}_i} \right) y^iL=(y^iyi1y^i1yi)

∂ L ∂ z = ∂ L ∂ y ^ i ⋅ ∂ y ^ i ∂ z = − ( y i y ^ i − 1 − y i 1 − y ^ i ) ⋅ y ^ i ( 1 − y ^ i ) = y ^ i − y i \frac{\partial L}{\partial z} = \frac{\partial L}{\partial \hat{y}_i} \cdot \frac{\partial \hat{y}_i}{\partial z} = - \left( \frac{y_i}{\hat{y}_i} - \frac{1 - y_i}{1 - \hat{y}_i} \right) \cdot \hat{y}_i (1 - \hat{y}_i)= \hat{y}_i - y_i zL=y^iLzy^i=(y^iyi1y^i1yi)y^i(1y^i)=y^iyi

当使用softmax激活函数时,交叉熵损失函数梯度(log x=ln x):
∂ L ∂ z j = ∂ L ∂ y ^ i ⋅ ∂ y ^ i ∂ z j = ∑ i = 1 − y i y i ^ ⋅ ∂ y ^ i ∂ z j = ( − y i y i ^ ⋅ ∂ y ^ i ∂ z j ) i = j + ∑ i = 1 , i ≠ j − y i y i ^ ⋅ ∂ y ^ i ∂ z j = − y i y i ^ ⋅ y i ^ ⋅ ( 1 − y j ^ ) + ∑ i = 1 , i ≠ j − y i y i ^ ⋅ − y i ^ y j ^ = − y j + y i y j ^ + ∑ i = 1 , i ≠ j y i y j ^ = − y j + y j ^ ∑ i = 1 y i = y j ^ − y j \frac{\partial L}{\partial z_j} = \frac{\partial L}{\partial \hat{y}_i} \cdot \frac{\partial \hat{y}_i}{\partial z_j} = \sum_{i=1}- \frac{y_i}{\hat{y_i}} \cdot \frac{\partial \hat{y}_i}{\partial z_j} \\ = (- \frac{y_i}{\hat{y_i}} \cdot \frac{\partial \hat{y}_i}{\partial z_j})_{i=j}+\sum_{i=1,i≠j}- \frac{y_i}{\hat{y_i}} \cdot \frac{\partial \hat{y}_i}{\partial z_j} \\ =- \frac{y_i}{\hat{y_i}}\cdot\hat{y_i}\cdot(1-\hat{y_j})+\sum_{i=1,i≠j}- \frac{y_i}{\hat{y_i}} \cdot -\hat{y_i}\hat{y_j}\\ =-y_j+y_i\hat{y_j}+\sum_{i=1,i≠j}y_i\hat{y_j}\\ =-y_j+\hat{y_j}\sum_{i=1}y_i\\ =\hat{y_j}-y_j zjL=y^iLzjy^i=i=1yi^yizjy^i=(yi^yizjy^i)i=j+i=1,i=jyi^yizjy^i=yi^yiyi^(1yj^)+i=1,i=jyi^yiyi^yj^=yj+yiyj^+i=1,i=jyiyj^=yj+yj^i=1yi=yj^yj

(Hidden Layer - Input Layer)

假设使用sigmoid激活函数,计算梯度并更新权重和偏差。

δ ( 2 ) = ∂ Loss ∂ Z 2 = ∂ Loss ∂ Z 3 ⋅ ∂ Z 3 ∂ A 2 ⋅ ∂ A 2 ∂ Z 2 = ( δ 3 ⋅ W 2 T ) ⊙ σ ′ ( Z 2 ) = [ ( w 11 ( 2 ) δ 1 ( 3 ) + w 21 ( 2 ) δ 2 ( 3 ) ) z 1 ( 2 ) ( 1 − z 1 ( 2 ) ) ( w 12 ( 2 ) δ 1 ( 3 ) + w 22 ( 2 ) δ 2 ( 3 ) ) z 2 ( 2 ) ( 1 − z 2 ( 2 ) ) ( w 13 ( 2 ) δ 1 ( 3 ) + w 23 ( 2 ) δ 2 ( 3 ) ) z 3 ( 2 ) ( 1 − z 3 ( 2 ) ) ] = [ δ 1 ( 2 ) δ 2 ( 2 ) δ 3 ( 2 ) ] \delta^{(2)} = \frac{\partial \text{Loss}}{\partial Z_2} = \frac{\partial \text{Loss}}{\partial Z_3} \cdot \frac{\partial Z_3}{\partial A_2} \cdot \frac{\partial A_2}{\partial Z_2} = (\delta_3 \cdot \mathbf{W_2}^T) \odot \sigma'(\mathbf{Z_2}) \\=\begin{bmatrix} (w_{11}^{(2)}\delta_1^{(3)} + w_{21}^{(2)}\delta_2^{(3)})z_1^{(2)}(1-z_1^{(2)}) \\ (w_{12}^{(2)}\delta_1^{(3)} + w_{22}^{(2)}\delta_2^{(3)})z_2^{(2)}(1-z_2^{(2)}) \\ (w_{13}^{(2)}\delta_1^{(3)} + w_{23}^{(2)}\delta_2^{(3)})z_3^{(2)}(1-z_3^{(2)}) \end{bmatrix} = \begin{bmatrix} \delta_1^{(2)} \\ \delta_2^{(2)} \\ \delta_3^{(2)} \end{bmatrix} δ(2)=Z2Loss=Z3LossA2Z3Z2A2=(δ3W2T)σ(Z2)= (w11(2)δ1(3)+w21(2)δ2(3))z1(2)(1z1(2))(w12(2)δ1(3)+w22(2)δ2(3))z2(2)(1z2(2))(w13(2)δ1(3)+w23(2)δ2(3))z3(2)(1z3(2)) = δ1(2)δ2(2)δ3(2)

W 1 : = W 1 − α ⋅ ∂ L o s s ∂ W 1 = W 1 − α ⋅ ∂ Loss ∂ Z 2 ⋅ ∂ Z 2 ∂ W 1 = W 1 − α ⋅ δ ( 2 ) ⋅ A 1 = [ w 11 ( 1 ) − α δ 1 ( 2 ) a 1 ( 1 ) w 12 ( 1 ) − α δ 1 ( 2 ) a 2 ( 1 ) w 21 ( 1 ) − α δ 2 ( 2 ) a 1 ( 1 ) w 22 ( 1 ) − α δ 2 ( 2 ) a 2 ( 1 ) w 31 ( 1 ) − α δ 3 ( 2 ) a 1 ( 1 ) w 32 ( 1 ) − α δ 3 ( 2 ) a 2 ( 1 ) ] W_1 := W_1 - \alpha \cdot \frac{\partial Loss}{\partial W_1}=W_1 - \alpha \cdot \frac{\partial \text{Loss}}{\partial Z_2} \cdot \frac{\partial Z_2}{\partial W_1} \\= W_1 - \alpha \cdot \delta^{(2)} \cdot \mathbf{A1}=\begin{bmatrix} w_{11}^{(1)}-\alpha\delta_1^{(2)}a_1^{(1)} & w_{12}^{(1)}-\alpha\delta_1^{(2)}a_2^{(1)} \\ w_{21}^{(1)}-\alpha\delta_2^{(2)}a_1^{(1)} & w_{22}^{(1)}-\alpha\delta_2^{(2)}a_2^{(1)} \\ w_{31}^{(1)}-\alpha\delta_3^{(2)}a_1^{(1)} & w_{32}^{(1)}-\alpha\delta_3^{(2)}a_2^{(1)}\end{bmatrix} W1:=W1αW1Loss=W1αZ2LossW1Z2=W1αδ(2)A1= w11(1)αδ1(2)a1(1)w21(1)αδ2(2)a1(1)w31(1)αδ3(2)a1(1)w12(1)αδ1(2)a2(1)w22(1)αδ2(2)a2(1)w32(1)αδ3(2)a2(1)

B 2 : = B 2 − α ⋅ ∂ L o s s ∂ B 2 = B 2 − α ⋅ δ ( 2 ) B_2 := B_2 - \alpha \cdot \frac{\partial Loss}{\partial B_2}=B2-\alpha \cdot\delta^{(2)} B2:=B2αB2Loss=B2αδ(2)


原文地址:https://blog.csdn.net/long11350/article/details/143714319

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!