04 简单神经网络推导及实现(C++/C)
简单神经网络推导及实现(C++/C)
代码见A simple neural network - stock price prediction as an example(C++/C)
1. 主要模块
Neuron.h
:用于声明神经元类和构造函数。Layer.h
:用于声明网络层类和构造函数。NNet.h
:用于声明神经网络类以及构造和初始化函数。Dataset.h
:用于声明数据集类和构造函数。Trainer.h
:用于声明神经网络的前向和后向传播类及相关函数。
2. 理解神经网络结构和参数
上图显示了一个具有2维(2-D)输入和1维(1-D)输出的3层简单神经网络。w和bias是网络的学习参数。具体计算过程如下:
前向传播
(Input Layer - Hidden Layer)
input
A 1 = X = [ x 1 , x 2 ] A_1=X=[x_1,x_2] A1=X=[x1,x2]
Weights
W 1 = [ w 11 ( 1 ) w 12 ( 1 ) w 21 ( 1 ) w 22 ( 1 ) w 31 ( 1 ) w 32 ( 1 ) ] W_1 = \begin{bmatrix} w_{11}^{(1)} & w_{12}^{(1)} \\ w_{21}^{(1)} & w_{22}^{(1)} \\ w_{31}^{(1)} & w_{32}^{(1)} \end{bmatrix} W1= w11(1)w21(1)w31(1)w12(1)w22(1)w32(1)
Biases
B 2 = [ b 1 ( 2 ) b 2 ( 2 ) b 3 ( 2 ) ] B_2 = \begin{bmatrix} b_1^{(2)} \\ b_2^{(2)} \\ b_3^{(2)} \end{bmatrix} B2= b1(2)b2(2)b3(2)
Weighted Sum
Z 2 = W 1 ⋅ A 1 T + B 2 = [ w 11 ( 1 ) a 1 ( 1 ) + w 12 ( 1 ) a 2 ( 1 ) + b 1 ( 2 ) w 21 ( 1 ) a 1 ( 1 ) + w 22 ( 1 ) a 2 ( 1 ) + b 2 ( 2 ) w 31 ( 1 ) a 1 ( 1 ) + w 32 ( 1 ) a 2 ( 1 ) + b 3 ( 2 ) ] Z_2 = W_1 \cdot A_1^T + B_2 = \begin{bmatrix} w_{11}^{(1)}a_1^{(1)} + w_{12}^{(1)}a_2^{(1)} + b_1^{(2)} \\ w_{21}^{(1)}a_1^{(1)} + w_{22}^{(1)}a_2^{(1)} + b_2^{(2)} \\ w_{31}^{(1)}a_1^{(1)} + w_{32}^{(1)}a_2^{(1)} + b_3^{(2)} \end{bmatrix} Z2=W1⋅A1T+B2= w11(1)a1(1)+w12(1)a2(1)+b1(2)w21(1)a1(1)+w22(1)a2(1)+b2(2)w31(1)a1(1)+w32(1)a2(1)+b3(2)
Activate (ReLU or Sigmoid)
A 2 = σ ( Z 2 ) = [ σ ( z 1 ) σ ( z 2 ) σ ( z 3 ) ] = [ a 1 ( 2 ) a 2 ( 2 ) a 3 ( 2 ) ] A_2 = \sigma(Z_2) = \begin{bmatrix} \sigma(z_1) \\ \sigma(z_2) \\ \sigma(z_3) \end{bmatrix} = \begin{bmatrix} a_1^{(2)} \\ a_2^{(2)} \\ a_3^{(2)} \end{bmatrix} A2=σ(Z2)= σ(z1)σ(z2)σ(z3) = a1(2)a2(2)a3(2)
(Hidden layer - Output layer)
Weights
W 2 = [ w 11 ( 2 ) w 12 ( 2 ) w 13 ( 2 ) w 21 ( 2 ) w 22 ( 2 ) w 23 ( 2 ) ] W_2 = \begin{bmatrix} w_{11}^{(2)} & w_{12}^{(2)} & w_{13}^{(2)} \\ w_{21}^{(2)} & w_{22}^{(2)} & w_{23}^{(2)} \end{bmatrix} W2=[w11(2)w21(2)w12(2)w22(2)w13(2)w23(2)]
Biases
B 3 = [ b 1 ( 3 ) b 2 ( 3 ) ] B_3 = \begin{bmatrix} b_1^{(3)} \\ b_2^{(3)} \end{bmatrix} B3=[b1(3)b2(3)]
Weighted Sum
Z 3 = W 2 ⋅ A 2 + B 3 = [ w 11 ( 2 ) a 1 ( 2 ) + w 12 ( 2 ) a 2 ( 2 ) + w 13 ( 2 ) a 3 ( 2 ) + b 1 ( 3 ) w 21 ( 2 ) a 1 ( 2 ) + w 22 ( 2 ) a 2 ( 2 ) + w 23 ( 2 ) a 3 ( 2 ) + b 2 ( 3 ) ] Z_3 = W_2 \cdot A_2 + B_3 =\begin{bmatrix} w_{11}^{(2)}a_{1}^{(2)} + w_{12}^{(2)}a_{2}^{(2)} + w_{13}^{(2)}a_{3}^{(2)} + b_1^{(3)} \\ w_{21}^{(2)}a_{1}^{(2)} + w_{22}^{(2)}a_{2}^{(2)} + w_{23}^{(2)}a_{3}^{(2)} + b_2^{(3)} \end{bmatrix} Z3=W2⋅A2+B3=[w11(2)a1(2)+w12(2)a2(2)+w13(2)a3(2)+b1(3)w21(2)a1(2)+w22(2)a2(2)+w23(2)a3(2)+b2(3)]
Activate (Sigmoid)
Y ^ = A 3 = σ ( Z 3 ) = [ y 1 ^ y 2 ^ ] \hat{Y} = A_3 =\sigma(Z_3) =\begin{bmatrix} \hat{y_1} \\ \hat{y_2} \end{bmatrix} Y^=A3=σ(Z3)=[y1^y2^]
注意: 回归问题中, A 3 = Z 3 A_3=Z_3 A3=Z3
计算损失(假设使用均方误差(MSE)损失函数或交叉熵(CE)损失函数)
M S E _ Loss = 1 2 m ∑ i = 1 m ( y i ^ − y i ) 2 {MSE}\_\text{Loss} = \frac{1}{2m} \sum_{i=1}^m (\hat{y_i} - y_i)^2 MSE_Loss=2m1i=1∑m(yi^−yi)2
二分类:
C
E
_
Loss
=
−
1
m
∑
i
=
1
m
[
y
i
log
(
y
^
i
)
+
(
1
−
y
i
)
log
(
1
−
y
^
i
)
]
{CE}\_\text{Loss}= -\frac{1}{m} \sum_{i=1}^m \left[ y_i \log(\hat{y}_i) + (1 - y_i) \log(1 - \hat{y}_i) \right]
CE_Loss=−m1i=1∑m[yilog(y^i)+(1−yi)log(1−y^i)]
多分类:
C
E
_
Loss
=
−
1
m
∑
i
=
1
m
y
i
log
(
y
^
i
)
{CE}\_\text{Loss}= -\frac{1}{m} \sum_{i=1}^m y_i \log(\hat{y}_i)
CE_Loss=−m1i=1∑myilog(y^i)
后向传播
(Output Layer - Hidden Layer)
计算梯度并更新权重和偏差。其中, α \alpha α是学习率。
δ ( 3 ) = ∂ Loss ∂ Z 3 = ( Y ^ − Y ) = A 3 − Y = [ δ 1 ( 3 ) δ 2 ( 3 ) ] \delta^{(3)} = \frac{\partial \text{Loss}}{\partial Z_3} = (\hat{Y} - Y) = A_3 - Y = \begin{bmatrix} \delta_1^{(3)} \\ \delta_2^{(3)} \end{bmatrix} δ(3)=∂Z3∂Loss=(Y^−Y)=A3−Y=[δ1(3)δ2(3)]
W 2 = W 2 − α ⋅ ∂ Loss ∂ W 2 = W 2 − α ⋅ ∂ Loss ∂ Z 3 ⋅ ∂ Z 3 ∂ W 2 = W 2 − α ⋅ δ ( 3 ) ⋅ A 2 T = [ w 11 ( 2 ) − α δ 1 ( 3 ) a 1 ( 2 ) w 12 ( 2 ) − α δ 1 ( 3 ) a 2 ( 2 ) w 13 ( 2 ) − α δ 1 ( 3 ) a 3 ( 2 ) w 21 ( 2 ) − α δ 2 ( 3 ) a 1 ( 2 ) w 22 ( 2 ) − α δ 2 ( 3 ) a 2 ( 2 ) w 23 ( 2 ) − α δ 2 ( 3 ) a 3 ( 2 ) ] W_2 = W_2 - \alpha \cdot \frac{\partial \text{Loss}}{\partial W_2} = W2 - \alpha \cdot \frac{\partial \text{Loss}}{\partial Z_3} \cdot \frac{\partial Z_3}{\partial W_2} \\ = W2 - \alpha \cdot \delta^{(3)} \cdot \mathbf{A_2}^T = \begin{bmatrix} w_{11}^{(2)} - \alpha \delta_1^{(3)}a_1^{(2)} & w_{12}^{(2)} - \alpha \delta_1^{(3)}a_2^{(2)} & w_{13}^{(2)} - \alpha \delta_1^{(3)}a_3^{(2)} \\ w_{21}^{(2)} - \alpha \delta_2^{(3)}a_1^{(2)} & w_{22}^{(2)} - \alpha \delta_2^{(3)}a_2^{(2)} & w_{23}^{(2)} - \alpha \delta_2^{(3)}a_3^{(2)} \end{bmatrix} W2=W2−α⋅∂W2∂Loss=W2−α⋅∂Z3∂Loss⋅∂W2∂Z3=W2−α⋅δ(3)⋅A2T=[w11(2)−αδ1(3)a1(2)w21(2)−αδ2(3)a1(2)w12(2)−αδ1(3)a2(2)w22(2)−αδ2(3)a2(2)w13(2)−αδ1(3)a3(2)w23(2)−αδ2(3)a3(2)]
B 3 = B 3 − α ⋅ ∂ Loss ∂ B 3 = B 3 − α ⋅ ∂ Loss ∂ Z 3 ⋅ ∂ Z 3 ∂ B 3 = B 3 − α ⋅ δ ( 3 ) = [ b 1 ( 3 ) − α δ 1 ( 3 ) b 2 ( 3 ) − α δ 2 ( 3 ) ] B_3 = B_3 - \alpha \cdot \frac{\partial \text{Loss}}{\partial B_3}=B_3 - \alpha \cdot \frac{\partial \text{Loss}}{\partial Z_3}\cdot \frac{\partial Z_3}{\partial B_3} = B3 - \alpha \cdot \delta^{(3)} = \begin{bmatrix} b_1^{(3)} - \alpha \delta_1^{(3)} \\ b_2^{(3)} - \alpha \delta_2^{(3)} \end{bmatrix} B3=B3−α⋅∂B3∂Loss=B3−α⋅∂Z3∂Loss⋅∂B3∂Z3=B3−α⋅δ(3)=[b1(3)−αδ1(3)b2(3)−αδ2(3)]
激活函数梯度
Sigmoid激活函数梯度:σ ( x ) = 1 1 + e − x = ( 1 + e − x ) − 1 \sigma(x) = \frac{1}{1 + e^{-x}} = (1 + e^{-x})^{-1} σ(x)=1+e−x1=(1+e−x)−1
σ ′ ( x ) = − ( 1 + e − x ) − 2 ⋅ ( − e − x ) = e − x ( 1 + e − x ) 2 = σ ( x ) ⊙ ( 1 − σ ( x ) ) \sigma'(x) = - (1 + e^{-x})^{-2} \cdot (-e^{-x}) = \frac{e^{-x}}{(1 + e^{-x})^2}=\sigma(x) \odot (1 - \sigma(x)) σ′(x)=−(1+e−x)−2⋅(−e−x)=(1+e−x)2e−x=σ(x)⊙(1−σ(x))
Softmax激活函数梯度:
σ ( x i ) = e i ∑ j = 1 e x j = e i s u m \sigma(x_i) = \frac{e^i}{\sum_{j=1} e^{x_j}} = \frac{e^i}{sum} σ(xi)=∑j=1exjei=sumei
当 i=j 时:
σ ′ ( x j ) = s u m ∗ e i − e j ∗ e i ( s u m ) 2 = σ ( x i ) ( 1 − σ ( x j ) ) \sigma'(x_j) = \frac{sum * e^i - e^j * e^i}{(sum)^2}=\sigma(x_i)(1-\sigma(x_j)) σ′(xj)=(sum)2sum∗ei−ej∗ei=σ(xi)(1−σ(xj))
当 i≠j 时:
σ ′ ( x j ) = s u m ∗ 0 − e j ∗ e i ( s u m ) 2 = − σ ( x i ) σ ( x j ) \sigma'(x_j) = \frac{sum * 0 - e^j * e^i}{(sum)^2} = - \sigma(x_i)\sigma(x_j) σ′(xj)=(sum)2sum∗0−ej∗ei=−σ(xi)σ(xj)
损失函数梯度
MSE损失函数梯度,通常不使用激活函数:∂ L ∂ z = ∂ L ∂ y ^ i = y i ^ − y i \frac{\partial \text{L}}{\partial z} = \frac{\partial \text{L}}{\partial \hat{y}_i} = \hat{y_i} - y_i ∂z∂L=∂y^i∂L=yi^−yi
当使用sigmoid激活函数时,交叉熵损失函数梯度(log x=ln x):
∂ L ∂ y ^ i = − ( y i y ^ i − 1 − y i 1 − y ^ i ) \frac{\partial L}{\partial \hat{y}_i} = - \left( \frac{y_i}{\hat{y}_i} - \frac{1 - y_i}{1 - \hat{y}_i} \right) ∂y^i∂L=−(y^iyi−1−y^i1−yi)
∂ L ∂ z = ∂ L ∂ y ^ i ⋅ ∂ y ^ i ∂ z = − ( y i y ^ i − 1 − y i 1 − y ^ i ) ⋅ y ^ i ( 1 − y ^ i ) = y ^ i − y i \frac{\partial L}{\partial z} = \frac{\partial L}{\partial \hat{y}_i} \cdot \frac{\partial \hat{y}_i}{\partial z} = - \left( \frac{y_i}{\hat{y}_i} - \frac{1 - y_i}{1 - \hat{y}_i} \right) \cdot \hat{y}_i (1 - \hat{y}_i)= \hat{y}_i - y_i ∂z∂L=∂y^i∂L⋅∂z∂y^i=−(y^iyi−1−y^i1−yi)⋅y^i(1−y^i)=y^i−yi
当使用softmax激活函数时,交叉熵损失函数梯度(log x=ln x):
∂ L ∂ z j = ∂ L ∂ y ^ i ⋅ ∂ y ^ i ∂ z j = ∑ i = 1 − y i y i ^ ⋅ ∂ y ^ i ∂ z j = ( − y i y i ^ ⋅ ∂ y ^ i ∂ z j ) i = j + ∑ i = 1 , i ≠ j − y i y i ^ ⋅ ∂ y ^ i ∂ z j = − y i y i ^ ⋅ y i ^ ⋅ ( 1 − y j ^ ) + ∑ i = 1 , i ≠ j − y i y i ^ ⋅ − y i ^ y j ^ = − y j + y i y j ^ + ∑ i = 1 , i ≠ j y i y j ^ = − y j + y j ^ ∑ i = 1 y i = y j ^ − y j \frac{\partial L}{\partial z_j} = \frac{\partial L}{\partial \hat{y}_i} \cdot \frac{\partial \hat{y}_i}{\partial z_j} = \sum_{i=1}- \frac{y_i}{\hat{y_i}} \cdot \frac{\partial \hat{y}_i}{\partial z_j} \\ = (- \frac{y_i}{\hat{y_i}} \cdot \frac{\partial \hat{y}_i}{\partial z_j})_{i=j}+\sum_{i=1,i≠j}- \frac{y_i}{\hat{y_i}} \cdot \frac{\partial \hat{y}_i}{\partial z_j} \\ =- \frac{y_i}{\hat{y_i}}\cdot\hat{y_i}\cdot(1-\hat{y_j})+\sum_{i=1,i≠j}- \frac{y_i}{\hat{y_i}} \cdot -\hat{y_i}\hat{y_j}\\ =-y_j+y_i\hat{y_j}+\sum_{i=1,i≠j}y_i\hat{y_j}\\ =-y_j+\hat{y_j}\sum_{i=1}y_i\\ =\hat{y_j}-y_j ∂zj∂L=∂y^i∂L⋅∂zj∂y^i=i=1∑−yi^yi⋅∂zj∂y^i=(−yi^yi⋅∂zj∂y^i)i=j+i=1,i=j∑−yi^yi⋅∂zj∂y^i=−yi^yi⋅yi^⋅(1−yj^)+i=1,i=j∑−yi^yi⋅−yi^yj^=−yj+yiyj^+i=1,i=j∑yiyj^=−yj+yj^i=1∑yi=yj^−yj
(Hidden Layer - Input Layer)
假设使用sigmoid激活函数,计算梯度并更新权重和偏差。
δ ( 2 ) = ∂ Loss ∂ Z 2 = ∂ Loss ∂ Z 3 ⋅ ∂ Z 3 ∂ A 2 ⋅ ∂ A 2 ∂ Z 2 = ( δ 3 ⋅ W 2 T ) ⊙ σ ′ ( Z 2 ) = [ ( w 11 ( 2 ) δ 1 ( 3 ) + w 21 ( 2 ) δ 2 ( 3 ) ) z 1 ( 2 ) ( 1 − z 1 ( 2 ) ) ( w 12 ( 2 ) δ 1 ( 3 ) + w 22 ( 2 ) δ 2 ( 3 ) ) z 2 ( 2 ) ( 1 − z 2 ( 2 ) ) ( w 13 ( 2 ) δ 1 ( 3 ) + w 23 ( 2 ) δ 2 ( 3 ) ) z 3 ( 2 ) ( 1 − z 3 ( 2 ) ) ] = [ δ 1 ( 2 ) δ 2 ( 2 ) δ 3 ( 2 ) ] \delta^{(2)} = \frac{\partial \text{Loss}}{\partial Z_2} = \frac{\partial \text{Loss}}{\partial Z_3} \cdot \frac{\partial Z_3}{\partial A_2} \cdot \frac{\partial A_2}{\partial Z_2} = (\delta_3 \cdot \mathbf{W_2}^T) \odot \sigma'(\mathbf{Z_2}) \\=\begin{bmatrix} (w_{11}^{(2)}\delta_1^{(3)} + w_{21}^{(2)}\delta_2^{(3)})z_1^{(2)}(1-z_1^{(2)}) \\ (w_{12}^{(2)}\delta_1^{(3)} + w_{22}^{(2)}\delta_2^{(3)})z_2^{(2)}(1-z_2^{(2)}) \\ (w_{13}^{(2)}\delta_1^{(3)} + w_{23}^{(2)}\delta_2^{(3)})z_3^{(2)}(1-z_3^{(2)}) \end{bmatrix} = \begin{bmatrix} \delta_1^{(2)} \\ \delta_2^{(2)} \\ \delta_3^{(2)} \end{bmatrix} δ(2)=∂Z2∂Loss=∂Z3∂Loss⋅∂A2∂Z3⋅∂Z2∂A2=(δ3⋅W2T)⊙σ′(Z2)= (w11(2)δ1(3)+w21(2)δ2(3))z1(2)(1−z1(2))(w12(2)δ1(3)+w22(2)δ2(3))z2(2)(1−z2(2))(w13(2)δ1(3)+w23(2)δ2(3))z3(2)(1−z3(2)) = δ1(2)δ2(2)δ3(2)
W 1 : = W 1 − α ⋅ ∂ L o s s ∂ W 1 = W 1 − α ⋅ ∂ Loss ∂ Z 2 ⋅ ∂ Z 2 ∂ W 1 = W 1 − α ⋅ δ ( 2 ) ⋅ A 1 = [ w 11 ( 1 ) − α δ 1 ( 2 ) a 1 ( 1 ) w 12 ( 1 ) − α δ 1 ( 2 ) a 2 ( 1 ) w 21 ( 1 ) − α δ 2 ( 2 ) a 1 ( 1 ) w 22 ( 1 ) − α δ 2 ( 2 ) a 2 ( 1 ) w 31 ( 1 ) − α δ 3 ( 2 ) a 1 ( 1 ) w 32 ( 1 ) − α δ 3 ( 2 ) a 2 ( 1 ) ] W_1 := W_1 - \alpha \cdot \frac{\partial Loss}{\partial W_1}=W_1 - \alpha \cdot \frac{\partial \text{Loss}}{\partial Z_2} \cdot \frac{\partial Z_2}{\partial W_1} \\= W_1 - \alpha \cdot \delta^{(2)} \cdot \mathbf{A1}=\begin{bmatrix} w_{11}^{(1)}-\alpha\delta_1^{(2)}a_1^{(1)} & w_{12}^{(1)}-\alpha\delta_1^{(2)}a_2^{(1)} \\ w_{21}^{(1)}-\alpha\delta_2^{(2)}a_1^{(1)} & w_{22}^{(1)}-\alpha\delta_2^{(2)}a_2^{(1)} \\ w_{31}^{(1)}-\alpha\delta_3^{(2)}a_1^{(1)} & w_{32}^{(1)}-\alpha\delta_3^{(2)}a_2^{(1)}\end{bmatrix} W1:=W1−α⋅∂W1∂Loss=W1−α⋅∂Z2∂Loss⋅∂W1∂Z2=W1−α⋅δ(2)⋅A1= w11(1)−αδ1(2)a1(1)w21(1)−αδ2(2)a1(1)w31(1)−αδ3(2)a1(1)w12(1)−αδ1(2)a2(1)w22(1)−αδ2(2)a2(1)w32(1)−αδ3(2)a2(1)
B 2 : = B 2 − α ⋅ ∂ L o s s ∂ B 2 = B 2 − α ⋅ δ ( 2 ) B_2 := B_2 - \alpha \cdot \frac{\partial Loss}{\partial B_2}=B2-\alpha \cdot\delta^{(2)} B2:=B2−α⋅∂B2∂Loss=B2−α⋅δ(2)
原文地址:https://blog.csdn.net/long11350/article/details/143714319
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!