从零开始训练神经网络

🕗 发布于 2024-01-16 12:55 神经网络 python 人工智能 机器学习 深度学习

训练（随机梯度下降）

我已经定义了向前和向后传递，但如何开始使用它们？我必须创建一个训练循环，并使用随机梯度下降（SGD）作为优化器来更新神经网络的参数。训练函数中有两个主要循环。一个循环表示 epoch 数，即我遍历整个数据集的次数，第二个循环用于逐个遍历每个观察值。

对于每个观测值，我都会使用进行前向传递，这是数组中长度为 784 的一张图像，如前所述。前向传递的与一起使用，后者是后向传递中的独热编码标签（真实值）。这给了我一本关于神经网络中权重更新的字典。x``output``y

def train(self, x_train, y_train, x_val, y_val):
    start_time = time.time()
    for iteration in range(self.epochs):
        for x,y in zip(x_train, y_train):
            output = self.forward_pass(x)
            changes_to_w = self.backward_pass(y, output)
            self.update_network_parameters(changes_to_w)

        accuracy = self.compute_accuracy(x_val, y_val)
        print('Epoch: {0}, Time Spent: {1:.2f}s, Accuracy: {2}'.format(
            iteration+1, time.time() - start_time, accuracy
        ))

显示更多

该函数具有 SGD 更新规则的代码，该规则只需要权重的梯度作为输入。需要明确的是，SGD 涉及使用来自后向传递的反向传播来计算梯度，而不仅仅是更新参数。它们似乎是分开的，应该分开考虑，因为这两种算法是不同的。update_network_parameters()

def update_network_parameters(self, changes_to_w):
    '''
        Update network parameters according to update rule from
        Stochastic Gradient Descent.

        θ = θ - η * ∇J(x, y),
            theta θ:            a network parameter (e.g. a weight w)
            eta η:              the learning rate
            gradient ∇J(x, y):  the gradient of the objective function,
                                i.e. the change for a specific theta θ
    '''

    for key, value in changes_to_w.items():
        for w_arr in self.params[key]:
            w_arr -= self.l_rate * value

显示更多

在更新了神经网络的参数后，我可以测量我之前准备的验证集的准确性，以验证网络在整个数据集上的每次迭代后的性能。

以下代码使用一些与训练函数相同的部分。首先，它进行前向传递，然后找到网络的预测，并检查与标签是否相等。之后，我将预测结果相加并除以 100 以找到准确性。接下来，我平均每个类的准确性。

def compute_accuracy(self, x_val, y_val):
    '''
        This function does a forward pass of x, then checks if the indices
        of the maximum value in the output equals the indices in the label
        y. Then it sums over each prediction and calculates the accuracy.
    '''
    predictions = []

    for x, y in zip(x_val, y_val):
        output = self.forward_pass(x)
        pred = np.argmax(output)
        predictions.append(pred == y)

    summed = sum(pred for pred in predictions) / 100.0
    return np.average(summed)

显示更多

最后，在知道会发生什么之后，我可以调用训练函数。我使用训练和验证数据作为训练函数的输入，然后等待。

dnn.train(x_train, y_train, x_val, y_val)

显示更多

请注意，结果可能会有很大差异，具体取决于权重的初始化方式。我的结果准确率为0%-95%。

以下是概述所发生情况的完整代码。

from sklearn.datasets import fetch_openml
from keras.utils.np_utils import to_categorical
import numpy as np
from sklearn.model_selection import train_test_split
import time

x, y = fetch_openml('mnist_784', version=1, return_X_y=True)
x = (x/255).astype('float32')
y = to_categorical(y)

x_train, x_val, y_train, y_val = train_test_split(x, y, test_size=0.15, random_state=42)

class DeepNeuralNetwork():
    def __init__(self, sizes, epochs=10, l_rate=0.001):
        self.sizes = sizes
        self.epochs = epochs
        self.l_rate = l_rate

        # we save all parameters in the neural network in this dictionary
        self.params = self.initialization()

    def sigmoid(self, x, derivative=False):
        if derivative:
            return (np.exp(-x))/((np.exp(-x)+1)**2)
        return 1/(1 + np.exp(-x))

    def softmax(self, x):
        # Numerically stable with large exponentials
        exps = np.exp(x - x.max())
        return exps / np.sum(exps, axis=0)

    def initialization(self):
        # number of nodes in each layer
        input_layer=self.sizes[0]
        hidden_1=self.sizes[1]
        hidden_2=self.sizes[2]
        output_layer=self.sizes[3]

        params = {
            'W1':np.random.randn(hidden_1, input_layer) * np.sqrt(1. / hidden_1),
            'W2':np.random.randn(hidden_2, hidden_1) * np.sqrt(1. / hidden_2),
            'W3':np.random.randn(output_layer, hidden_2) * np.sqrt(1. / output_layer)
        }

        return params

    def forward_pass(self, x_train):
        params = self.params

        # input layer activations becomes sample
        params['A0'] = x_train

        # input layer to hidden layer 1
        params['Z1'] = np.dot(params["W1"], params['A0'])
        params['A1'] = self.sigmoid(params['Z1'])

        # hidden layer 1 to hidden layer 2
        params['Z2'] = np.dot(params["W2"], params['A1'])
        params['A2'] = self.sigmoid(params['Z2'])

        # hidden layer 2 to output layer
        params['Z3'] = np.dot(params["W3"], params['A2'])
        params['A3'] = self.softmax(params['Z3'])

        return params['A3']

    def backward_pass(self, y_train, output):
        '''
            This is the backpropagation algorithm, for calculating the updates
            of the neural network's parameters.

            Note: There is a stability issue that causes warnings. This is
                  caused  by the dot and multiply operations on the huge arrays.

                  RuntimeWarning: invalid value encountered in true_divide
                  RuntimeWarning: overflow encountered in exp
                  RuntimeWarning: overflow encountered in square
        '''
        params = self.params
        change_w = {}

        # Calculate W3 update
        error = output - y_train
        change_w['W3'] = np.dot(error, params['A3'])

        # Calculate W2 update
        error = np.multiply( np.dot(params['W3'].T, error), self.sigmoid(params['Z2'], derivative=True) )
        change_w['W2'] = np.dot(error, params['A2'])

        # Calculate W1 update
        error = np.multiply( np.dot(params['W2'].T, error), self.sigmoid(params['Z1'], derivative=True) )
        change_w['W1'] = np.dot(error, params['A1'])

        return change_w

    def update_network_parameters(self, changes_to_w):
        '''
            Update network parameters according to update rule from
            Stochastic Gradient Descent.

            θ = θ - η * ∇J(x, y),
                theta θ:            a network parameter (e.g. a weight w)
                eta η:              the learning rate
                gradient ∇J(x, y):  the gradient of the objective function,
                                    i.e. the change for a specific theta θ
        '''

        for key, value in changes_to_w.items():
            for w_arr in self.params[key]:
                w_arr -= self.l_rate * value

    def compute_accuracy(self, x_val, y_val):
        '''
            This function does a forward pass of x, then checks if the indices
            of the maximum value in the output equals the indices in the label
            y. Then it sums over each prediction and calculates the accuracy.
        '''
        predictions = []

        for x, y in zip(x_val, y_val):
            output = self.forward_pass(x)
            pred = np.argmax(output)
            predictions.append(pred == y)

        summed = sum(pred for pred in predictions) / 100.0
        return np.average(summed)

    def train(self, x_train, y_train, x_val, y_val):
        start_time = time.time()
        for iteration in range(self.epochs):
            for x,y in zip(x_train, y_train):
                output = self.forward_pass(x)
                changes_to_w = self.backward_pass(y, output)
                self.update_network_parameters(changes_to_w)

            accuracy = self.compute_accuracy(x_val, y_val)
            print('Epoch: {0}, Time Spent: {1:.2f}s, Accuracy: {2}'.format(
                iteration+1, time.time() - start_time, accuracy
            ))

dnn = DeepNeuralNetwork(sizes=[784, 128, 64, 10])
dnn.train(x_train, y_train, x_val, y_val)

显示更多

NumPy 中的良好练习

您可能已经注意到，该代码的可读性很强，但它占用了大量空间，可以优化为循环运行。这是一个优化和改进它的机会。如果你是这个主题的新手，以下练习的难度很容易很难，最后一个练习是最难的。

简单：实现 ReLU 激活函数或任何其他激活函数。检查 sigmoid 函数的实现方式以供参考，并记住实现导数。使用 ReLU 激活函数代替 sigmoid 函数。
简单：初始化偏差并将它们添加到前向传递中的激活函数之前的 Z，并在后向传递中更新它们。当您尝试添加偏差时，请注意数组的维度。
中：优化前向和后向传递，使它们在每个函数中循环运行。这使得代码更易于修改，并且可能更易于维护。for
- 优化为神经网络进行权重的初始化函数，以便您可以在不使神经网络失败的情况下修改参数。sizes=[]
中：实现小批量梯度下降，取代随机梯度下降。不要更新每个样品的参数，而是根据小批量中每个样品累积的梯度总和的平均值进行更新。小批量的大小通常在 64 以下。
困难：实现 Adam 优化器。这应该在训练功能中实现。
1. 通过添加额外的术语来实现 Momentum
2. 基于 AdaGrad 优化器实现自适应学习率
3. 结合步骤 1 和 2 来实现 Adam

我的信念是，如果你完成这些练习，你就会有一个良好的基础。下一步是实现卷积、滤波器等，但这留待以后的文章使用。

作为免责声明，这些练习没有解决方案。

PyTorch

现在，我已经展示了如何通过反向传播为前馈神经网络实现这些计算，让我们看看与 NumPy 相比，PyTorch 为我们节省了多少时间。

加载 MNIST 数据集

其中一件看起来比它应该更复杂或更难理解的事情是使用 PyTorch 加载数据集。

首先定义数据的转换，指定它应该是一个张量，并且应该对其进行规范化。然后，将 in 与数据集结合使用来加载数据集。这就是您所需要的。稍后将了解如何从这些加载程序中解压缩值。DataLoader``import

import torch
from torchvision import datasets, transforms

transform = transforms.Compose([
                transforms.ToTensor(),
                transforms.Normalize((0.1307,), (0.3081,))
            ])

train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=True, download=True, transform=transform))

test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=False, transform=transform))

显示更多

训练

我定义了一个名为类似于之前用 NumPy 编写的类的类。这个类有一些相同的方法，但你可以清楚地看到，我不需要考虑初始化网络参数，也不需要考虑PyTorch中的向后传递，因为这些函数和计算精度的函数都消失了。Net``DeepNeuralNetwork

import time
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self, epochs=10):
        super(Net, self).__init__()
        self.linear1 = nn.Linear(784, 128)
        self.linear2 = nn.Linear(128, 64)
        self.linear3 = nn.Linear(64, 10)

        self.epochs = epochs

    def forward_pass(self, x):
        x = self.linear1(x)
        x = torch.sigmoid(x)
        x = self.linear2(x)
        x = torch.sigmoid(x)
        x = self.linear3(x)
        x = torch.softmax(x, dim=0)
        return x

    def one_hot_encode(self, y):
        encoded = torch.zeros([10], dtype=torch.float64)
        encoded[y[0]] = 1.
        return encoded

    def train(self, train_loader, optimizer, criterion):
        start_time = time.time()
        loss = None

        for iteration in range(self.epochs):
            for x,y in train_loader:
                y = self.one_hot_encode(y)
                optimizer.zero_grad()
                output = self.forward_pass(torch.flatten(x))
                loss = criterion(output, y)
                loss.backward()
                optimizer.step()

            print('Epoch: {0}, Time Spent: {1:.2f}s, Loss: {2}'.format(
                iteration+1, time.time() - start_time, loss
            ))

显示更多

在阅读本类时，请注意 PyTorch 已经为我们实现了所有相关的激活函数，以及不同类型的层。你甚至不必考虑它。您可以只定义一些图层，例如全连接图层。nn.Linear()

我之前已经导入了优化器，现在我指定了要使用的优化器，以及损失的标准。我将优化器和标准都传递到训练函数中，PyTorch 开始运行示例，就像在 NumPy 中一样。我甚至可以包括一个衡量准确性的指标，但为了衡量损失而忽略了这一点。

model = Net()

optimizer = optim.SGD(model.parameters(), lr=0.001)
criterion = nn.BCEWithLogitsLoss()

model.train(train_loader, optimizer, criterion)

显示更多

使用 Keras 的 TensorFlow 2.0

对于神经网络的 TensorFlow/Keras 版本，我选择使用一种简单的方法，最大限度地减少代码行数。这意味着我没有定义任何类，而是使用 Keras 的高级 API 来制作一个仅用几行代码的神经网络。如果你刚刚开始学习神经网络，你会发现使用 Keras 时进入门槛最低。因此，我推荐它。

我首先导入以后需要的所有函数。

import tensorflow as tf
from tensorflow.keras.datasets import mnist
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.layers import Flatten, Dense
from tensorflow.keras.losses import BinaryCrossentropy

显示更多

我只需这几行代码即可加载数据集并对其进行预处理。请注意，我只对训练数据进行预处理，因为我不打算将验证数据用于此方法。稍后，我将解释如何使用验证数据。

(x_train, y_train), (x_val, y_val) = mnist.load_data()

x_train = x_train.astype('float32') / 255
y_train = to_categorical(y_train)

显示更多

下一步是定义模型。在 Keras 中，在知道要将哪些图层应用于数据后，这非常简单。在本例中，我将使用完全连接的层，如 NumPy 示例所示。在 Keras 中，这是由函数完成的。Dense()

定义模型的层后，我将编译模型并定义优化器、损失函数和度量。最后，我可以告诉 Keras 拟合 10 个 epoch 的训练数据，就像在其他示例中一样。

model = tf.keras.Sequential([
    Flatten(input_shape=(28, 28)),
    Dense(128, activation='sigmoid'),
    Dense(64, activation='sigmoid'),
    Dense(10)
])

model.compile(optimizer='SGD',
              loss=BinaryCrossentropy(),
              metrics=['accuracy'])

model.fit(x_train, y_train, epochs=10)

显示更多

如果要使用验证数据，可以使用 fit 函数的参数传入：validation_data

model.fit(x_train, y_train, epochs=10, validation_data=(x_val, y_val))

显示更多

结论

本文介绍了如何在没有框架帮助的情况下构建神经网络的基础知识，这些框架可能使其更易于使用。我构建了一个具有 4 层的基本深度神经网络，并解释了如何通过实现前向和后向传递（反向传播）来构建基本的深度神经网络。

原文地址：https://blog.csdn.net/2401_82469710/article/details/135593240

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：2024年美赛美国大学生数学建模竞赛A题思路解析+代码+论文
下一篇：车载存储市场即将大爆发，江波龙如何抢占先发优势

1466C/D/E/G/H信号发生器
Ceyear 1466系列信号发生器是一款面向微波毫米波尖端测试的通用测试仪器，频率范围覆盖宽、信号频谱纯度高，具有高准确度和大动态范围的功率输出，搭配单机双射频通道的设计，可满足用户多种测试要求。1
阅读更多2024-09-24
webrtc-candidate形成分析
webrtc
阅读更多2024-09-24
海山数据库(He3DB)源码详解：CommitSubTransaction函数
李超，移动云数据库工程师，负责云原生数据库He3DB的研发。弹出子事务节点。恢复事务状态为默认状态。调用PopTransaction()函数从事务链栈中弹出子事务节点。李超，移动云数据库工程师，负责云
阅读更多2024-09-24
【bug记录9】transform 3D变化的时候，背面按钮翻转到正面的时候无法点击/选中
2、让front一开始就作为正面transform:rotateY(0deg)，而父元素初始设为transform:rotateY(180deg)。在3d效果中，背面的元素翻转过来只是在gpu渲染层面
阅读更多2024-09-24
PyCharm 安装教程
你可以选择黑色的 Darcula 主题，或者保持白色的 Light主题，根据个人喜好进行选择。5. 输入代码后，点击右上角的绿色三角形按钮，或右键选择 **Run**，运行你的 Python 程序。你
阅读更多2024-09-24
【论文阅读】Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation
3d感知表示非常适合机器人操作，因为它们。许多操作任务在末端执行器姿态预测中，这对于处理来说计算成本很高。因此，大多数操作policies直接在2d中运行，上述3d归纳偏差。在本文中，我们介绍了act
阅读更多2024-09-24
使用 Docker 部署 RStudio 的终极教程
不同版本的R包可能会引发兼容性问题。以Seurat包为例，V4和V5之间存在较大差异，而这些版本所依赖的其他R包也会对现有代码产生影响。如果你感兴趣，可以参考我的【Seuarat4和Seurat5 共
阅读更多2024-09-24
Etcd权限认证管理
9 使用root持有的读写角色tset权限操作资源key成功 ctl put key "test角色授予root用户为key键只读操作" --user=root:root roo
阅读更多2024-09-24
sql语法学习：关键点和详细解释
..用于创建数据库。用于创建表，指定列名、数据类型和约束条件。SQL语法涵盖了数据库操作的各个方面，从基本的增删改查到复杂的查询、事务控制和高级功能如存储过程和触发器。掌握这些语法将帮助你有效地管理和
阅读更多2024-09-24
cocos creator 集成ffmpeg
node_modules@ffmpeg\ffmpeg\package.json添加。
阅读更多2024-09-24

从零开始训练神经网络

训练（随机梯度下降）

NumPy 中的良好练习

PyTorch

加载 MNIST 数据集

训练

使用 Keras 的 TensorFlow 2.0

结论

相关文章