机器学习周报（9.9-9.15）-Pytorch学习（三）

🕗 发布于 2024-09-20 22:27 机器学习 pytorch 学习

文章目录

摘要

本次学习对Pytorch中有关常用的损失函数进行了相关学习和实操，并对Pytorch中交叉熵损失函数的原理进行学习和相关公式的推导；并学习了优化器通过计算模型的损失函数进行模型的优化；同时学习了现在训练成熟的网络模型的使用、修改以及网络模型的保存和读取。

Abstract

In this study, the common loss functions in Pytorch are studied and implemented, and the principle of cross-entropy loss function in Pytorch is studied and related formulas are derived. The optimizer can optimize the model by calculating the loss function of the model. At the same time, we learned how to use and modify the network model and how to save and read the network model.

1 损失函数与反向传播

1.1 L1Loss损失函数

import torch
from torch.nn import L1Loss

inputs = torch.tensor([1, 2, 3], dtype=torch.float32)
targets = torch.tensor([1, 2, 5], dtype=torch.float32)

l1 = L1Loss()
result = l1(inputs, targets)
print(result)   # tensor(0.6667)

l1 = L1Loss(reduction=‘mean’)

默认reduction=‘mean’，求每个数据差的绝对值再取平均
当reduction=‘sum’，即求每个数据差的绝对值求和，此时输出为：tensor(2.)

1.2 MSELoss损失函数

torch.nn.MSELoss
在这里插入图片描述

import torch
from torch.nn import L1Loss, MSELoss

inputs = torch.tensor([1, 2, 3], dtype=torch.float32)
targets = torch.tensor([1, 2, 5], dtype=torch.float32)

# MSELoss损失函数
m1 = MSELoss(reduction='mean') #默认值
result = m1(inputs, targets)
print(result)     # tensor(1.3333)

m2 = MSELoss(reduction='sum')
result = m2(inputs, targets)
print(result)     # tensor(4.)

1.3 交叉熵损失函数（CrossEntropyLoss）

在这里插入图片描述

softmax函数又称归一化指数函数，是基于 sigmoid 二分类函数在多分类任务上的推广；在多分类网络中，常用 Softmax 作为最后一层进行分类。

import torch
import torch.nn as nn

input1 = torch.tensor([-0.5, -0.3, 0, 0.3, 0.5])
input2 = torch.tensor([-3, -1, 0, 1, 3], dtype=torch.float32)

softmax = nn.Softmax(dim=0)
output1 = softmax(input1)
output2 = softmax(input2)
print(output1) # tensor([0.1135, 0.1386, 0.1871, 0.2525, 0.3084])
print(output2) # tensor([0.0021, 0.0152, 0.0413, 0.1122, 0.8292])

Softmax 可以使正样本（正数）的结果趋近于 1，使负样本（负数）的结果趋近于 0；且样本的绝对值越大，两极化越明显。
Softmax 可以使数值较大的值获得更大的概率

Pytorch中nn.CrossEntropyLoss，结合了nn.LogSoftmax()和nn.NLLLoss()两个函数，在做分类训练时非常有用

在这里插入图片描述

import torch
import torch.nn as nn

input2 = torch.tensor([0.1, 0.2, 0.3])
target2 = torch.tensor([1])
input2 = torch.reshape(input2, (1, 3))
l = crossEntropyLoss(input2, target2)
print(l) # tensor(1.1019)
# 计算公式：
# -0.2 + ln(exp(0.1)+exp(0.2)+exp(0.3))

import torch
import torch.nn as nn

crossEntropyLoss = nn.CrossEntropyLoss()
input = torch.tensor([[-0.1342, -2.5835, -0.9810],
                     [0.1867, -1.4513, -0.3225],
                     [0.6272, -0.1120, 0.3048]])
target = torch.tensor([0, 2, 1])
loss = crossEntropyLoss(input, target)
print(loss)
'''
    [-(-0.1342)+ln(exp(-0.1342)+exp(-2.5835)+exp(-0.9810)) 
    -(-0.3225)+ln(exp(0.1867)+exp(-1.4513)+exp(-0.3225))
    -(-0.1120)+ln(exp(0.6272)+exp(-0.1120)+exp(0.3048))]/3 = 3.03842655071/3 = 1.01280885024

'''

1.4 反向传播

import torch.nn as nn
import torchvision
from torch.nn import Conv2d, MaxPool2d, Sequential, Linear, Flatten, CrossEntropyLoss
from torch.utils.data import DataLoader

#数据集
dataset = torchvision.datasets.CIFAR10("dataset2", train=False, transform= torchvision.transforms.ToTensor())

data_loader = DataLoader(dataset, batch_size=1)

class seq(nn.Module):
    def __init__(self):
        super(seq, self).__init__()
        self.model = Sequential(
            Conv2d(3, 32, 5, padding=2),
            MaxPool2d(2),
            Conv2d(32, 32, 5, padding=2),
            MaxPool2d(2),
            Conv2d(32, 64, 5, padding=2),
            MaxPool2d(2),
            Flatten(),
            Linear(1024, 64),
            Linear(64, 10)
        )
    def forward(self,x):
        x = self.model(x)
        return x

s = seq()
# 交叉熵
loss = CrossEntropyLoss()

for data in data_loader:
    imgs, target = data
    output = s(imgs)
    # print(output)
    # print(target)
    result_loss = loss(output, target)
    # print(result_loss)
    # 反向传播
    # 计算出来的 loss 值有 backward 方法属性，
    # 反向传播来计算每个节点的更新的参数。
    # 这里查看网络的属性 grad 梯度属性刚开始没有，
    # 反向传播计算出来后才有，后面优化器会利用梯度优化网络参数。      
    result_loss.backward()
    print('ok')

还未执行反向传播
在这里插入图片描述

执行反向传播之后，进行了gradient descent，grad值进行了更新

在这里插入图片描述

2 优化器

torch.optim

# 数据集
dataset = torchvision.datasets.CIFAR10("dataset2", train=False, transform= torchvision.transforms.ToTensor())
data_loader = DataLoader(dataset, batch_size=1)

# 定义模型
model=...

#训练模型
for data in data_loader:
    imgs, target = data
    output = seq(imgs)
    result_loss = loss(output, target)
    # 优化器先将网络中的每个参数的梯度清零
    optim.zero_grad()
    # 调用损失函数的反向传播求出每个节点的梯度
    result_loss.backward()
    # 更新参数
    optim.step()

Debug:
将这三行代码打上断点，依次执行观察grad和data的变化

在这里插入图片描述

执行第42行代码前跟执行42行代码之后，grad都是没有值的

在这里插入图片描述

执行44行反向传播代码之后，grad由none变化，出现参数

在这里插入图片描述

执行46行代码前后，data的数值发生了变化

在这里插入图片描述

训练20个回合（epoch）

训练20个回合，看每个回合的loss值

#训练模型:
for epoch in range(20):
    sum_loss = 0
    for data in data_loader:
        imgs, target = data
        output = seq(imgs)
        result_loss = loss(output, target)
        # 优化器先将网络中的每个参数的梯度清零
        optim.zero_grad()
        # 调用损失函数的反向传播求出每个节点的梯度
        result_loss.backward()
        # 更新参数
        optim.step()
        # print(result_loss)
        sum_loss = sum_loss+result_loss

    print(sum_loss)

在这里插入图片描述

3 现有网络模型的使用及修改

vgg16模型为例，它是以ImageNet数据集进行训练得到的，但是ImageNet数据集不公开并且数据量非常庞大，不下载，仅用于增加和修改该网络模型的学习

import torchvision

vgg16_true = torchvision.models.vgg16()

print(vgg16_true)

在这里插入图片描述

可以看到，该网络模型最后是一个线性变化：Linear(4096,1000)，现在想该网络模型最后的线性变化改为10输出

方法一：在VGG16后面添加一个线性层

vgg16_true.add_module('add_linear', nn.Linear(1000, 10))

或者

vgg16_true.classifier.add_module('add_linear', nn.Linear(1000, 10))

方法二：直接修改VGG16的最后一个线性层

vgg16_true.classifier[6] = nn.Linear(4096, 10)

4 网络模型的保存与读取

4.1 保存模型

import torch
import torchvision

vgg16 = torchvision.models.vgg16()

# 保存方式一，模型结构+模型参数
torch.save(vgg16, "vgg16_method1.pth")

# 保存方式二，模型参数（官方推荐）
torch.save(vgg16.state_dict(), "vgg16_method2.pth")

4.2 读取

import torch
import torchvision

# 方式一 -> 保存方式一，加载模型
model = torch.load("vgg16_method1.pth")
print(model)


# 方式二：对应保存方式2
vgg16 = torchvision.models.vgg16()
vgg16.load_state_dict(torch.load("vgg16_method2.pth"))
print(vgg16)

方式一保存模型有陷阱

# save.py
# 保存方式一存在陷阱
class modelcc(nn.Module):
    def __init__(self):
        super(modelcc, self).__init__()
        self.conv1 = nn.Conv2d(3, 32, 5, 1)

    def forward(self, x):
        return self.conv1(x)

cc = modelcc()
torch.save(cc, "cctest.pth")

# load.py

cc = torch.load("cctest.pth")
print(cc)

在实际运用时，一般把model单独写一个python文件，然后通过下面这行代码在使用时进行引入

from model_save import *

总结

本周学习了Pytorch中一些小简单的损失函数的数学公式和使用，搜索相关资料更深刻学习了交叉熵损失函数，学习了网络模型的使用、修改、保存和读取。下周，我将通过学习minist数据集相关任务，加深对CNN原理的学习

原文地址：https://blog.csdn.net/weixin_51923997/article/details/141960140

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：Flask常用案例详解：从基础到进阶
下一篇：oracle 多表查询

Qt文件目录操作
QCoreApplication 是为无 GUI 应用程序提供事件循环的类，是所有应用程序类的基类，其子类 QGuiApplication 为有 GUI 界面的应用程序提供流控制和主要设定，QGuiA
阅读更多2024-11-16
从0开始深度学习（30）——语言模型和数据集
在上一节中，我们将文本数据映射为词元，并制作了词表。这一节我们将介绍语言模型和语言数据集。
阅读更多2024-11-16
git常用命令+搭vscode使用
4.或者基于分支git checkout origin/ 在本地master中重新拉取远程上的某个分支；然后从当前创建新的分支后推到远程（实现基于某个分支创建新分支）git fetch + git m
阅读更多2024-11-16
【阅读记录-章节2】Build a Large Language Model (From Scratch)
文本转换为数值向量（嵌入）嵌入是LLMs（大规模语言模型）处理文本数据的关键。因为LLMs不能直接处理原始文本数据，所以需要将文本转换为数值表示，这些数值表示就是嵌入。嵌入将离散数据（例如词语或图像）
阅读更多2024-11-16
Vue 3 条件渲染与列表渲染完整指南
本文详细介绍了 Vue 3 中的v-ifv-show和v-for指令的使用方法、适用场景、常见优化技巧和注意事项。在实际开发中，合理使用这些指令不仅可以提高代码的可读性，还能提升应用性能。掌握这些条件
阅读更多2024-11-16
接口文档的编写
（Application Programming Interface）即应用程序接口。可以认为 API 是一个软件组件或是一个 Web 服务与外界进行的交互的接口。目的是提供应用程序与开发人员基于某软
阅读更多2024-11-16
thinkphp增删查改例子
以上示例演示了如何使用ThinkPHP进行数据库的增删查改操作。您可以根据自己的需求对示例进行修改和扩展。
阅读更多2024-11-16
openai 论文Scaling Laws for Neural Language Models学习
最佳性能取决于作为幂律的总计算量 (参见等式(1.3)). 我们为方程提供了一些基本的理论动因(1.5)、对学习曲线拟合及其对训练时间的影响的分析，以及对每个 token 的结果的细分。传输性能随
阅读更多2024-11-16
0x00基础算法 -- 0x05 排序
离散化，中位数，第k大数，归排--逆序对
阅读更多2024-11-16
C#里实现日期比较
例如，刻度值为 3124137600000000L 表示星期五，0100 年 1 月 12 日 12：00：00 午夜。值 DateTime 类型表示日期和时间，其值范围从 00：00：00 （午夜）
阅读更多2024-11-16

机器学习周报（9.9-9.15）-Pytorch学习（三）

文章目录

摘要

Abstract

1 损失函数与反向传播

1.1 L1Loss损失函数

1.2 MSELoss损失函数

1.3 交叉熵损失函数（CrossEntropyLoss）

1.4 反向传播

2 优化器

3 现有网络模型的使用及修改

4 网络模型的保存与读取

4.1 保存模型

4.2 读取

总结

相关文章