Pytorch基础:网络层

🕗 发布于 2024-10-06 15:25 pytorch 人工智能 深度学习

文章目录

1.卷积层-Convolution Layers

卷积运算:卷积运算在输入信号(图像)上滑动,相应位置上进行乘加.
卷积核:又称过滤器,可认为是某种形式,某种特征.
卷积过程:类似于用一个模板去图形上寻找与它相似的区域,与卷积核模式越相似,激活值越高,从而实现特征提取.

1.1 1d/2d/3d卷积

卷积维度:一般情况下,卷积核在几个维度上滑动,就是几维卷积
(1)一维卷积

卷积

(2)二维卷积
在这里插入图片描述
(3)三维卷积

1.2卷积–nn.Conv2d

functions: 对多个二维信号进行二维卷积
params:

Args:
        in_channels (int): Number of channels in the input image
        out_channels (int): Number of channels produced by the convolution
        kernel_size (int or tuple): Size of the convolving kernel
        stride (int or tuple, optional): Stride of the convolution. Default: 1
        padding (int, tuple or str, optional): Padding added to all four sides of
            the input. Default: 0
        padding_mode (str, optional): ``'zeros'``, ``'reflect'``,
            ``'replicate'`` or ``'circular'``. Default: ``'zeros'``
        dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
        groups (int, optional): Number of blocked connections from input
            channels to output channels. Default: 1
        bias (bool, optional): If ``True``, adds a learnable bias to the
            output. Default: ``True``
 Shape:
        - Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
        - Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where

          .. math::
              H_{out} = \left\lfloor\frac{H_{in}  + 2 \times \text{padding}[0] - \text{dilation}[0]
                        \times (\text{kernel\_size}[0] - 1) - 1}{\text{stride}[0]} + 1\right\rfloor

          .. math::
              W_{out} = \left\lfloor\frac{W_{in}  + 2 \times \text{padding}[1] - \text{dilation}[1]
                        \times (\text{kernel\_size}[1] - 1) - 1}{\text{stride}[1]} + 1\right\rfloor

    Attributes:
        weight (Tensor): the learnable weights of the module of shape
            :math:`(\text{out\_channels}, \frac{\text{in\_channels}}{\text{groups}},`
            :math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
            The values of these weights are sampled from
            :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
            :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
        bias (Tensor):   the learnable bias of the module of shape
            (out_channels). If :attr:`bias` is ``True``,
            then the values of these weights are
            sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
            :math:`k = \frac{groups}{C_\text{in} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`

examples:

Examples:

        >>> # With square kernels and equal stride
        >>> m = nn.Conv2d(16, 33, 3, stride=2)
        >>> # non-square kernels and unequal stride and with padding
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
        >>> # non-square kernels and unequal stride and with padding and dilation
        >>> m = nn.Conv2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2), dilation=(3, 1))
        >>> input = torch.randn(20, 16, 50, 100)
        >>> output = m(input)

output_size calculation formula:
在这里插入图片描述
Conv2d运算原理：

主要代码段如下：

（1）加载图片，将图片处理成张量的形式：

# ================================= load img ==================================
 
path_img = os.path.join(os.path.dirname(os.path.abspath(__file__)), "pig.jpeg")
print(path_img)
img = Image.open(path_img).convert('RGB')  # 0~255
 
# convert to tensor  compose封装图像预处理方法
img_transform = transforms.Compose([transforms.ToTensor()])
img_tensor = img_transform(img)
# 添加 batch 维度
img_tensor.unsqueeze_(dim=0)    # C*H*W to B*C*H*W

（2）进行卷积操作：

# =============== create convolution layer ==================
 
# ================ 2d
 flag = 1
#flag = 0
if flag:
    #定义一个卷积层
    conv_layer = nn.Conv2d(3, 1, 3)   # input:(i, o, size) weights:(o, i , h, w)
    # 初始化卷积层权值
    nn.init.xavier_normal_(conv_layer.weight.data)
    # nn.init.xavier_uniform_(conv_layer.weight.data)
 
    # 卷积运算
    img_conv = conv_layer(img_tensor)

result followed:

# ================================= visualization ==================================
print("卷积前尺寸:{}\n卷积后尺寸:{}".format(img_tensor.shape, img_conv.shape))
img_conv = transform_invert(img_conv[0, 0:1, ...], img_transform)
img_raw = transform_invert(img_tensor.squeeze(), img_transform)
plt.subplot(122).imshow(img_conv, cmap='gray')
plt.subplot(121).imshow(img_raw)
plt.show()

在这里插入图片描述

1.3转置卷积(实现上采样)

nn.ConvTranspose2d

 Args:
        in_channels (int): Number of channels in the input image
        out_channels (int): Number of channels produced by the convolution
        kernel_size (int or tuple): Size of the convolving kernel
        stride (int or tuple, optional): Stride of the convolution. Default: 1
        padding (int or tuple, optional): ``dilation * (kernel_size - 1) - padding`` zero-padding
            will be added to both sides of each dimension in the input. Default: 0
        output_padding (int or tuple, optional): Additional size added to one side
            of each dimension in the output shape. Default: 0
        groups (int, optional): Number of blocked connections from input channels to output channels. Default: 1
        bias (bool, optional): If ``True``, adds a learnable bias to the output. Default: ``True``
        dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
    """.format(**reproducibility_notes, **convolution_notes) + r"""

    Shape:
        - Input: :math:`(N, C_{in}, H_{in}, W_{in})` or :math:`(C_{in}, H_{in}, W_{in})`
        - Output: :math:`(N, C_{out}, H_{out}, W_{out})` or :math:`(C_{out}, H_{out}, W_{out})`, where

        .. math::
              H_{out} = (H_{in} - 1) \times \text{stride}[0] - 2 \times \text{padding}[0] + \text{dilation}[0]
                        \times (\text{kernel\_size}[0] - 1) + \text{output\_padding}[0] + 1
        .. math::
              W_{out} = (W_{in} - 1) \times \text{stride}[1] - 2 \times \text{padding}[1] + \text{dilation}[1]
                        \times (\text{kernel\_size}[1] - 1) + \text{output\_padding}[1] + 1

    Attributes:
        weight (Tensor): the learnable weights of the module of shape
                         :math:`(\text{in\_channels}, \frac{\text{out\_channels}}{\text{groups}},`
                         :math:`\text{kernel\_size[0]}, \text{kernel\_size[1]})`.
                         The values of these weights are sampled from
                         :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
                         :math:`k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`
        bias (Tensor):   the learnable bias of the module of shape (out_channels)
                         If :attr:`bias` is ``True``, then the values of these weights are
                         sampled from :math:`\mathcal{U}(-\sqrt{k}, \sqrt{k})` where
                         :math:`k = \frac{groups}{C_\text{out} * \prod_{i=0}^{1}\text{kernel\_size}[i]}`

    Examples::

        >>> # With square kernels and equal stride
        >>> m = nn.ConvTranspose2d(16, 33, 3, stride=2)
        >>> # non-square kernels and unequal stride and with padding
        >>> m = nn.ConvTranspose2d(16, 33, (3, 5), stride=(2, 1), padding=(4, 2))
        >>> input = torch.randn(20, 16, 50, 100)
        >>> output = m(input)
        >>> # exact output size can be also specified as an argument
        >>> input = torch.randn(1, 16, 12, 12)
        >>> downsample = nn.Conv2d(16, 16, 3, stride=2, padding=1)
        >>> upsample = nn.ConvTranspose2d(16, 16, 3, stride=2, padding=1)
        >>> h = downsample(input)
        >>> h.size()
        torch.Size([1, 16, 6, 6])
        >>> output = upsample(h, output_size=input.size())
        >>> output.size()
        torch.Size([1, 16, 12, 12])

output size calculation formular
在这里插入图片描述

2.池化层

池化运算：对信号进行“收集”并“总结”，类似水池收集水资源，因而叫作池化层。

“收集”：多变少

“总结”：最大值 or 平均值

如图用2×2的窗口进行池化操作，最大池化用最大值代替这个窗口，平均池化用平均值代替这个窗口。
在这里插入图片描述
（1）nn.MaxPool2d
功能：对二维信号（图像）进行最大值池化

主要参数：

kernel_size：卷积核尺寸
stride：步长
padding：填充个数
dilation：池化间隔大小
ceil_mode：尺寸向上取整,默认为False
return_indices：记录池化像素索引
注意：stride一般设置的与窗口大小一致，以避免重叠

具体代码如下：

数据预处理：

set_seed(1)  # 设置随机种子
 
# ================================= load img ==================================
path_img = os.path.join(os.path.dirname(os.path.abspath(__file__)), "pig.jpeg")
img = Image.open(path_img).convert('RGB')  # 0~255
 
# convert to tensor
img_transform = transforms.Compose([transforms.ToTensor()])
img_tensor = img_transform(img)
img_tensor.unsqueeze_(dim=0)    # C*H*W to B*C*H*W

最大池化代码:

# ================ maxpool
 flag = 1
#flag = 0
if flag:
    maxpool_layer = nn.MaxPool2d((2, 2), stride=(2, 2))   # input:(i, o, size) weights:(o, i , h, w)
    img_pool = maxpool_layer(img_tensor)

在这里插入图片描述
the change of ouput size:

（2）nn.AvgPool2d 自己研究
功能：对二维信号（图像）进行平均值池化

主要参数：

kernel_size：卷积核尺寸
stride：步长
padding：填充个数
dilation：池化间隔大小
count_include_pad ：填充值用于计算
divisor_override：除法因子（自定义分母）
平均池化代码：

# ================ avgpool
flag = 1
#flag = 0
if flag:
    avgpoollayer = nn.AvgPool2d((2, 2), stride=(2, 2))   # input:(i, o, size) weights:(o, i , h, w)
    img_pool = avgpoollayer(img_tensor)

effect result:
在这里插入图片描述
最大值池化和平均池化的差别：最大池化的亮度会稍微亮一些，毕竟它都是取的最大值，而平均池化是取平均值。
（3）nn.MaxUnpool2d
功能：对二维信号（图像）进行最大值池化上采样（反池化：将大尺寸图像变为小尺寸图像）
在这里插入图片描述
主要参数：

kernel_size：卷积核尺寸
stride：步长
padding：填充个数
这里的参数与池化层是类似的。唯一的不同就是前向传播的时候我们需要传进一个indices，我们的索引值，要不然不知道把输入的元素放在输出的哪个位置上。
在这里插入图片描述

反池化代码：

# ================ max unpool
flag = 1
#flag = 0
if flag:
    # pooling
    img_tensor = torch.randint(high=5, size=(1, 1, 4, 4), dtype=torch.float)
    #最大值池化保留索引    
    maxpool_layer = nn.MaxPool2d((2, 2), stride=(2, 2), return_indices=True)
    img_pool, indices = maxpool_layer(img_tensor)
 
    # unpooling
    img_reconstruct = torch.randn_like(img_pool, dtype=torch.float)
    #反池化操作
    maxunpool_layer = nn.MaxUnpool2d((2, 2), stride=(2, 2))
    img_unpool = maxunpool_layer(img_reconstruct, indices)
 
    print("raw_img:\n{}\nimg_pool:\n{}".format(img_tensor, img_pool))
    print("img_reconstruct:\n{}\nimg_unpool:\n{}".format(img_reconstruct, img_unpool))

在这里插入图片描述

3.线性层—Linear Layer

线性层又称为全连接层，其每个神经元与上一层所有神经元相连实现对前一层的线性组合，线性变换。
在这里插入图片描述
nn.Linear
功能：对一维信号（向量）进行线性组合

主要参数：

in_features：输入结点数
out_features：输出结点数
bias：是否需要偏置
计算公式：y = 𝒙𝑾𝑻 + 𝒃𝒊𝒂𝒔

具体代码如下：

# ================ linear
flag = 1
# flag = 0
if flag:
    inputs = torch.tensor([[1., 2, 3]])
    linear_layer = nn.Linear(3, 4)
    linear_layer.weight.data = torch.tensor([[1., 1., 1.],
                                             [2., 2., 2.],
                                             [3., 3., 3.],
                                             [4., 4., 4.]])
 
    #设置偏置
    linear_layer.bias.data.fill_(0)
    output = linear_layer(inputs)
    print(inputs, inputs.shape)
    print(linear_layer.weight.data, linear_layer.weight.data.shape)
    print(output, output.shape)

在这里插入图片描述

4.激活函数层—Activate Layer

激活函数对特征进行非线性变换，赋予多层神经网络具有深度的意义
（1）nn.Sigmoid
在这里插入图片描述

m = nn.Sigmoid()
input = torch.randn(2)
output = m(input)

（2）nn.tanh
在这里插入图片描述

m = nn.Tanh()
input = torch.randn(2)
output = m(input)

…

原文地址：https://blog.csdn.net/qq_37269626/article/details/142722763

免责声明：本站文章内容转载自网络资源，如本站内容侵犯了原著者的合法权益，可联系本站删除。更多内容请关注自学内容网（zxcms.com）！

上一篇：好用耐用充电宝品牌有哪些？推荐2024年热门款充电宝
下一篇：cmake的使用

Qt源码-Qt多媒体音频框架
这里记录一下在Linux下Qt 的 Qt Multimedia 模块的设计，我目前先记录与音频相关的库的设计。不同Qt版本的设计有些不一样，需要看对应版本的源码设计。
阅读更多2024-10-08
数据库概述（1）
查询接口层：假设你要开发一款转账的应用，这时候就需要用到数据库驱动程序，驱动程序会实现API接口的功能【API接口是指实现各种软件之间交互的工具】，通过驱动程序可以让转账的应用接入数据库，进行查询、存
阅读更多2024-10-08
OpenCV视频I/O(20）视频写入类VideoWriter之用于将图像帧写入视频文件函数write()的使用
cv::VideoWriter::write() 函数用于将图像帧写入视频文件。
阅读更多2024-10-08
测试用例的进阶二
本文主要写了软件测试的分类；
阅读更多2024-10-08
keras yolo8目标检测
labels=['car','bus','train','truck']来做目标检测,用的backbone = keras_cv.models.YOLOV8Backbone.from_preset()
阅读更多2024-10-08
Mysql锁机制解读(敲详细)
主要是对未提交事务，修改表结构造成表结构混乱，进行控制。主要是避免加锁前的行级遍历(行级锁)，提高性能。
阅读更多2024-10-08
如何使用ssm实现基于BS的超市商品管理系统的设计与实现+vue
【代码】ssm基于BS的超市商品管理系统的设计与实现+vue
阅读更多2024-10-08
掌握 ASP.NET Web 开发：从基础到身份验证
是微软开发的一个功能强大的框架，广泛用于构建现代化的 Web 应用程序。它支持 MVC 架构、Web API、Razor 语法，并提供完善的身份验证与授权机制。本文将介绍的基础知识、MVC 模式、We
阅读更多2024-10-08
APISIX 联动雷池 WAF 实现 Web 安全防护
Apache APISIX 是一个动态、实时、高性能的云原生 API 网关，提供了负载均衡、动态上游、灰度发布、服务熔断、身份认证、可观测性等丰富的流量管理功能。
阅读更多2024-10-08
eNodeB User Manual - Troubleshooting
RF条件可能受到所使用的天线的影响，我们建议使用Ettus的Vert2450天线（或类似的）。可以使用实验室设备或开源工具（如Kalibrate-RTL）来估计您的射频前端的CFO，并手动通过在eNo
阅读更多2024-10-08

Pytorch基础:网络层

文章目录

1.卷积层-Convolution Layers

1.1 1d/2d/3d卷积

1.2卷积–nn.Conv2d

1.3转置卷积(实现上采样)

2.池化层

3.线性层—Linear Layer

4.激活函数层—Activate Layer

相关文章