自学内容网 自学内容网

PaddleOCR模型ch_PP-OCRv3文本检测模型研究(一)骨干网络

从源码上看,PaddleOCR一共支持四个版本,分别是PP-OCR、PP-OCRv2、PP-OCRv3、PP-OCRv4。本文选择PaddleOCR的v3版本的骨干网络作为研究对象,力图探究网络模型的内部结构。

研究起点

参考官网配置文件,提取其中21-36行描述模型架构的内容如下:

Architecture:
  model_type: det
  algorithm: DB
  Transform:
  Backbone:
    name: MobileNetV3
    scale: 0.5
    model_name: large
    disable_se: True
  Neck:
    name: RSEFPN
    out_channels: 96
    shortcut: True
  Head:
    name: DBHead
    k: 50

从这段配置,描述了模型训练的网络架构。依次可以看出模型类型为文本检测det,检测算法为DB,骨干网backbone为MobileNetV3,颈部网络为RSEFPN,头部网络为DBHead。本文聚焦在骨干网络,单独列出其设置的参数值清单:

  • scale=0.5
  • model_name=‘large’
  • disable_se=True

而MobileNetV3的代码位于det_mobilenet_v3.py文件中,具体可以参考官方源码。为对照上述配置,这里仅仅摘取其构造函数如下:

def __init__(self, in_channels=3, model_name="large", scale=0.5, disable_se=False,**kwargs):

可以看出参数配置基本保持了默认,只有disable_se参数由默认值False改为了True。各参数含义如下:

  • in_channels,整数int类型,代表输入张量通道数,默认为3通道
  • model_name,字符串str类型,代表模型型号。支持large和small两类,其中large模型包含15个残差层,small模型给包含11个残差层。
  • scale,浮点float类型,代表模型通道拉伸系数,支持0.35/0.5/0.75/1.0/1.25五种,值越大中间层通道数越多,模型参数更多。以输入张量BCHW=5,3,64,192为例,scale分别选择0.5/1/1.25三种,模型的参数数量分别为1.57M/5.66M/8.77M。
  • disable_se,布尔bool类型,代表是否在残差层中禁用SE模块,默认值为False。

卷归层

ConvBNLayer类的定义在det_mobilenet_v3.py文件中第159-201行,主要包含了一个卷积层二维卷积层(Convolution2D Layer)一个批正则化层(Batch Normalization Layer),这里将ConvBNLayer翻译为卷归层。阅读卷归层的代码,可以看出其内部结构如下图:
卷归层

上图中的输入张量结构为c_in,h,w(为描述简便省略了张量实际结构中的批处理大小),分别代表输入通道数、高度、宽度。通过一个Conv2D操作,输出张量结构为c_out,h,w,分别代表输出通道数、高度、宽度,高宽的计算可以参考官方文档conv2d。接着是一个批正则化操作,张量结构不变。最后,如何设置了if_act参数为True,还会接一个激活函数计算,支持relu和hardswish两种,两个函数的说明可以参考官方文档reluhard_swish

压发层

SEModule类的定义在det_mobilenet_v3.py文件中第261-289行,主要包含了一个均值池化层(Adaptive Average Pool Layer)和两个卷积层,英文名称为Squeeze and Excitation Network,这里将SEModule翻译为压发层。阅读压发层的代码,可以看出其内部结构如下图:
压发层
上图中的输入张量结构为c_in,h,w。结果一个均值池化操作后,张量结构变为c_in,1,1。接着是一个挤压的卷积操作,输出通道数压缩到c_in//r,r参数可以设置,默认值为4。后接一个relu激活函数调用,然后是一个激发的卷积操作,通道数恢复到c_in。紧接着一个HardSigmoid操作,参见hard_sigmoid官方文档。然后与输入张量做乘法,最终输出的张量结构与输入相同。

残差层

ResidualUnit类的定义在det_mobilenet_v3.py文件中第204-258行,主要包含了三个卷归层(ConvBNLayer)和一个压发层(SEModule),这里将ResidualUnit翻译为残差层。阅读残差层的代码,可以看出其内部结构如下图:
残差层
上图中的输入张量结构为c_in,h,w,其中c_in代表输入通道数。经过第一个卷归层后,输出张量结构c_mid,h,w,其中c_mid为设置的中间通道数。因为第一个卷积层设置的卷积核k为1x1,步长s为1,填充p为0,所以输出张量的高宽不变。紧接着进入第二个卷归层,卷积核大小、步长、填充均由参数设置决定,所以输出张量结构为c_mid,h,w。如果设置use_se为True,那么进入到压发层,输出张量结构与输入相同。接着进入第三个卷归层,输出张量结构为c_out,h,w,其中c_out为设置的输出通道数参数。如果s_in设置为1,并且c_out与c_in相同,那么if_shortcut就等于True,此时第三个卷积层输出的张量结构与输入张量结构相同,网络最后增加了一个张量相加操作。这时,残差层的含义就凸显出来了,可以简单理解为通过多层神经网络处理,将结果加到输入上,增加的部分就是差额。

骨干网

MobileNetV3类的定义在det_mobilenet_v3.py文件中第37-156行,这里将MobileNetV3理解为骨干网。有了前几节的基础知识,清楚了卷归层就是卷积+归一化,压发层就是平均池化+卷积,残差层就是卷归+压发,那么骨干网的代码很容易看懂。为了理解的直观性,减少参数理解障碍,假设输入张量结构为5,3,64,320,分别代表批处理大小为5,通道数3,图像高度64,宽度320。在此前提下,加上第一节配置文件中的设置,可以总结出骨干网结构如下图:
骨干网上图中,对于ConvBNLayer,只列举k、s、p三个参数,分别代表卷积核大小、步长、填充;对于ResidualUnit,只列举mid、k、s、r四个参数,分别代表中间通道数、卷积核大小、步长、是否做残差加法(if_shortcut);每个层前后的四个数字代表BCHW的张量结构,蓝色字体表示层操作前后张量结构有变化。以下分五个部分来解释上图:

  • 第一个卷归层
    通过一个卷积核为3、步长为2的卷归层,将输入张量的通道数由3转为8通道输出,宽高各压缩一半。
  • 第一阶段stage0
    通过三个残差层,将输入通道8转为16通道输出,宽高各压缩一半。
  • 第二阶段stage1
    通过三个残差层,将输入通道16转为24通道输出,宽高各压缩一半。
  • 第三阶段stage2
    通过六个残差层,将输入通道24转为56通道输出,宽高各压缩一半。
  • 第四阶段stage3
    通过三个残差层和一个卷归层,将输入通道56转为480通道输出,宽高各压缩一半。

代码实验

通过调用paddle.summary函数,以(5, 3, 64, 320)为输入张量结构,得到以下输出:

---------------------------------------------------------------------------
 Layer (type)       Input Shape          Output Shape         Param #    
===========================================================================
   Conv2D-1      [[5, 3, 64, 320]]     [5, 8, 32, 160]          216      
  BatchNorm-1    [[5, 8, 32, 160]]     [5, 8, 32, 160]          32       
 ConvBNLayer-1   [[5, 3, 64, 320]]     [5, 8, 32, 160]           0       第一个卷归层
   Conv2D-2      [[5, 8, 32, 160]]     [5, 8, 32, 160]          64       stage0开始
  BatchNorm-2    [[5, 8, 32, 160]]     [5, 8, 32, 160]          32       
 ConvBNLayer-2   [[5, 8, 32, 160]]     [5, 8, 32, 160]           0       
   Conv2D-3      [[5, 8, 32, 160]]     [5, 8, 32, 160]          72       
  BatchNorm-3    [[5, 8, 32, 160]]     [5, 8, 32, 160]          32       
 ConvBNLayer-3   [[5, 8, 32, 160]]     [5, 8, 32, 160]           0       
   Conv2D-4      [[5, 8, 32, 160]]     [5, 8, 32, 160]          64       
  BatchNorm-4    [[5, 8, 32, 160]]     [5, 8, 32, 160]          32       
 ConvBNLayer-4   [[5, 8, 32, 160]]     [5, 8, 32, 160]           0       
ResidualUnit-1   [[5, 8, 32, 160]]     [5, 8, 32, 160]           0       
   Conv2D-5      [[5, 8, 32, 160]]     [5, 32, 32, 160]         256      
  BatchNorm-5    [[5, 32, 32, 160]]    [5, 32, 32, 160]         128      
 ConvBNLayer-5   [[5, 8, 32, 160]]     [5, 32, 32, 160]          0       
   Conv2D-6      [[5, 32, 32, 160]]    [5, 32, 16, 80]          288      
  BatchNorm-6    [[5, 32, 16, 80]]     [5, 32, 16, 80]          128      
 ConvBNLayer-6   [[5, 32, 32, 160]]    [5, 32, 16, 80]           0       
   Conv2D-7      [[5, 32, 16, 80]]     [5, 16, 16, 80]          512      
  BatchNorm-7    [[5, 16, 16, 80]]     [5, 16, 16, 80]          64       
 ConvBNLayer-7   [[5, 32, 16, 80]]     [5, 16, 16, 80]           0       
ResidualUnit-2   [[5, 8, 32, 160]]     [5, 16, 16, 80]           0       
   Conv2D-8      [[5, 16, 16, 80]]     [5, 40, 16, 80]          640      
  BatchNorm-8    [[5, 40, 16, 80]]     [5, 40, 16, 80]          160      
 ConvBNLayer-8   [[5, 16, 16, 80]]     [5, 40, 16, 80]           0       
   Conv2D-9      [[5, 40, 16, 80]]     [5, 40, 16, 80]          360      
  BatchNorm-9    [[5, 40, 16, 80]]     [5, 40, 16, 80]          160      
 ConvBNLayer-9   [[5, 40, 16, 80]]     [5, 40, 16, 80]           0       
   Conv2D-10     [[5, 40, 16, 80]]     [5, 16, 16, 80]          640      
 BatchNorm-10    [[5, 16, 16, 80]]     [5, 16, 16, 80]          64       
ConvBNLayer-10   [[5, 40, 16, 80]]     [5, 16, 16, 80]           0       
ResidualUnit-3   [[5, 16, 16, 80]]     [5, 16, 16, 80]           0       stage0结束
   Conv2D-11     [[5, 16, 16, 80]]     [5, 40, 16, 80]          640      stage1开始
 BatchNorm-11    [[5, 40, 16, 80]]     [5, 40, 16, 80]          160      
ConvBNLayer-11   [[5, 16, 16, 80]]     [5, 40, 16, 80]           0       
   Conv2D-12     [[5, 40, 16, 80]]      [5, 40, 8, 40]         1,000     
 BatchNorm-12     [[5, 40, 8, 40]]      [5, 40, 8, 40]          160      
ConvBNLayer-12   [[5, 40, 16, 80]]      [5, 40, 8, 40]           0       
   Conv2D-13      [[5, 40, 8, 40]]      [5, 24, 8, 40]          960      
 BatchNorm-13     [[5, 24, 8, 40]]      [5, 24, 8, 40]          96       
ConvBNLayer-13    [[5, 40, 8, 40]]      [5, 24, 8, 40]           0       
ResidualUnit-4   [[5, 16, 16, 80]]      [5, 24, 8, 40]           0       
   Conv2D-14      [[5, 24, 8, 40]]      [5, 64, 8, 40]         1,536     
 BatchNorm-14     [[5, 64, 8, 40]]      [5, 64, 8, 40]          256      
ConvBNLayer-14    [[5, 24, 8, 40]]      [5, 64, 8, 40]           0       
   Conv2D-15      [[5, 64, 8, 40]]      [5, 64, 8, 40]         1,600     
 BatchNorm-15     [[5, 64, 8, 40]]      [5, 64, 8, 40]          256      
ConvBNLayer-15    [[5, 64, 8, 40]]      [5, 64, 8, 40]           0       
   Conv2D-16      [[5, 64, 8, 40]]      [5, 24, 8, 40]         1,536     
 BatchNorm-16     [[5, 24, 8, 40]]      [5, 24, 8, 40]          96       
ConvBNLayer-16    [[5, 64, 8, 40]]      [5, 24, 8, 40]           0       
ResidualUnit-5    [[5, 24, 8, 40]]      [5, 24, 8, 40]           0       
   Conv2D-17      [[5, 24, 8, 40]]      [5, 64, 8, 40]         1,536     
 BatchNorm-17     [[5, 64, 8, 40]]      [5, 64, 8, 40]          256      
ConvBNLayer-17    [[5, 24, 8, 40]]      [5, 64, 8, 40]           0       
   Conv2D-18      [[5, 64, 8, 40]]      [5, 64, 8, 40]         1,600     
 BatchNorm-18     [[5, 64, 8, 40]]      [5, 64, 8, 40]          256      
ConvBNLayer-18    [[5, 64, 8, 40]]      [5, 64, 8, 40]           0       
   Conv2D-19      [[5, 64, 8, 40]]      [5, 24, 8, 40]         1,536     
 BatchNorm-19     [[5, 24, 8, 40]]      [5, 24, 8, 40]          96       
ConvBNLayer-19    [[5, 64, 8, 40]]      [5, 24, 8, 40]           0       
ResidualUnit-6    [[5, 24, 8, 40]]      [5, 24, 8, 40]           0       stage1结束
   Conv2D-20      [[5, 24, 8, 40]]     [5, 120, 8, 40]         2,880     stage2开始
 BatchNorm-20    [[5, 120, 8, 40]]     [5, 120, 8, 40]          480      
ConvBNLayer-20    [[5, 24, 8, 40]]     [5, 120, 8, 40]           0       
   Conv2D-21     [[5, 120, 8, 40]]     [5, 120, 4, 20]         1,080     
 BatchNorm-21    [[5, 120, 4, 20]]     [5, 120, 4, 20]          480      
ConvBNLayer-21   [[5, 120, 8, 40]]     [5, 120, 4, 20]           0       
   Conv2D-22     [[5, 120, 4, 20]]      [5, 40, 4, 20]         4,800     
 BatchNorm-22     [[5, 40, 4, 20]]      [5, 40, 4, 20]          160      
ConvBNLayer-22   [[5, 120, 4, 20]]      [5, 40, 4, 20]           0       
ResidualUnit-7    [[5, 24, 8, 40]]      [5, 40, 4, 20]           0       
   Conv2D-23      [[5, 40, 4, 20]]     [5, 104, 4, 20]         4,160     
 BatchNorm-23    [[5, 104, 4, 20]]     [5, 104, 4, 20]          416      
ConvBNLayer-23    [[5, 40, 4, 20]]     [5, 104, 4, 20]           0       
   Conv2D-24     [[5, 104, 4, 20]]     [5, 104, 4, 20]          936      
 BatchNorm-24    [[5, 104, 4, 20]]     [5, 104, 4, 20]          416      
ConvBNLayer-24   [[5, 104, 4, 20]]     [5, 104, 4, 20]           0       
   Conv2D-25     [[5, 104, 4, 20]]      [5, 40, 4, 20]         4,160     
 BatchNorm-25     [[5, 40, 4, 20]]      [5, 40, 4, 20]          160      
ConvBNLayer-25   [[5, 104, 4, 20]]      [5, 40, 4, 20]           0       
ResidualUnit-8    [[5, 40, 4, 20]]      [5, 40, 4, 20]           0       
   Conv2D-26      [[5, 40, 4, 20]]      [5, 96, 4, 20]         3,840     
 BatchNorm-26     [[5, 96, 4, 20]]      [5, 96, 4, 20]          384      
ConvBNLayer-26    [[5, 40, 4, 20]]      [5, 96, 4, 20]           0       
   Conv2D-27      [[5, 96, 4, 20]]      [5, 96, 4, 20]          864      
 BatchNorm-27     [[5, 96, 4, 20]]      [5, 96, 4, 20]          384      
ConvBNLayer-27    [[5, 96, 4, 20]]      [5, 96, 4, 20]           0       
   Conv2D-28      [[5, 96, 4, 20]]      [5, 40, 4, 20]         3,840     
 BatchNorm-28     [[5, 40, 4, 20]]      [5, 40, 4, 20]          160      
ConvBNLayer-28    [[5, 96, 4, 20]]      [5, 40, 4, 20]           0       
ResidualUnit-9    [[5, 40, 4, 20]]      [5, 40, 4, 20]           0       
   Conv2D-29      [[5, 40, 4, 20]]      [5, 96, 4, 20]         3,840     
 BatchNorm-29     [[5, 96, 4, 20]]      [5, 96, 4, 20]          384      
ConvBNLayer-29    [[5, 40, 4, 20]]      [5, 96, 4, 20]           0       
   Conv2D-30      [[5, 96, 4, 20]]      [5, 96, 4, 20]          864      
 BatchNorm-30     [[5, 96, 4, 20]]      [5, 96, 4, 20]          384      
ConvBNLayer-30    [[5, 96, 4, 20]]      [5, 96, 4, 20]           0       
   Conv2D-31      [[5, 96, 4, 20]]      [5, 40, 4, 20]         3,840     
 BatchNorm-31     [[5, 40, 4, 20]]      [5, 40, 4, 20]          160      
ConvBNLayer-31    [[5, 96, 4, 20]]      [5, 40, 4, 20]           0       
ResidualUnit-10   [[5, 40, 4, 20]]      [5, 40, 4, 20]           0       
   Conv2D-32      [[5, 40, 4, 20]]     [5, 240, 4, 20]         9,600     
 BatchNorm-32    [[5, 240, 4, 20]]     [5, 240, 4, 20]          960      
ConvBNLayer-32    [[5, 40, 4, 20]]     [5, 240, 4, 20]           0       
   Conv2D-33     [[5, 240, 4, 20]]     [5, 240, 4, 20]         2,160     
 BatchNorm-33    [[5, 240, 4, 20]]     [5, 240, 4, 20]          960      
ConvBNLayer-33   [[5, 240, 4, 20]]     [5, 240, 4, 20]           0       
   Conv2D-34     [[5, 240, 4, 20]]      [5, 56, 4, 20]        13,440     
 BatchNorm-34     [[5, 56, 4, 20]]      [5, 56, 4, 20]          224      
ConvBNLayer-34   [[5, 240, 4, 20]]      [5, 56, 4, 20]           0       
ResidualUnit-11   [[5, 40, 4, 20]]      [5, 56, 4, 20]           0       
   Conv2D-35      [[5, 56, 4, 20]]     [5, 336, 4, 20]        18,816     
 BatchNorm-35    [[5, 336, 4, 20]]     [5, 336, 4, 20]         1,344     
ConvBNLayer-35    [[5, 56, 4, 20]]     [5, 336, 4, 20]           0       
   Conv2D-36     [[5, 336, 4, 20]]     [5, 336, 4, 20]         3,024     
 BatchNorm-36    [[5, 336, 4, 20]]     [5, 336, 4, 20]         1,344     
ConvBNLayer-36   [[5, 336, 4, 20]]     [5, 336, 4, 20]           0       
   Conv2D-37     [[5, 336, 4, 20]]      [5, 56, 4, 20]        18,816     
 BatchNorm-37     [[5, 56, 4, 20]]      [5, 56, 4, 20]          224      
ConvBNLayer-37   [[5, 336, 4, 20]]      [5, 56, 4, 20]           0       
ResidualUnit-12   [[5, 56, 4, 20]]      [5, 56, 4, 20]           0       stage2结束
   Conv2D-38      [[5, 56, 4, 20]]     [5, 336, 4, 20]        18,816     stage3开始
 BatchNorm-38    [[5, 336, 4, 20]]     [5, 336, 4, 20]         1,344     
ConvBNLayer-38    [[5, 56, 4, 20]]     [5, 336, 4, 20]           0       
   Conv2D-39     [[5, 336, 4, 20]]     [5, 336, 2, 10]         8,400     
 BatchNorm-39    [[5, 336, 2, 10]]     [5, 336, 2, 10]         1,344     
ConvBNLayer-39   [[5, 336, 4, 20]]     [5, 336, 2, 10]           0       
   Conv2D-40     [[5, 336, 2, 10]]      [5, 80, 2, 10]        26,880     
 BatchNorm-40     [[5, 80, 2, 10]]      [5, 80, 2, 10]          320      
ConvBNLayer-40   [[5, 336, 2, 10]]      [5, 80, 2, 10]           0       
ResidualUnit-13   [[5, 56, 4, 20]]      [5, 80, 2, 10]           0       
   Conv2D-41      [[5, 80, 2, 10]]     [5, 480, 2, 10]        38,400     
 BatchNorm-41    [[5, 480, 2, 10]]     [5, 480, 2, 10]         1,920     
ConvBNLayer-41    [[5, 80, 2, 10]]     [5, 480, 2, 10]           0       
   Conv2D-42     [[5, 480, 2, 10]]     [5, 480, 2, 10]        12,000     
 BatchNorm-42    [[5, 480, 2, 10]]     [5, 480, 2, 10]         1,920     
ConvBNLayer-42   [[5, 480, 2, 10]]     [5, 480, 2, 10]           0       
   Conv2D-43     [[5, 480, 2, 10]]      [5, 80, 2, 10]        38,400     
 BatchNorm-43     [[5, 80, 2, 10]]      [5, 80, 2, 10]          320      
ConvBNLayer-43   [[5, 480, 2, 10]]      [5, 80, 2, 10]           0       
ResidualUnit-14   [[5, 80, 2, 10]]      [5, 80, 2, 10]           0       
   Conv2D-44      [[5, 80, 2, 10]]     [5, 480, 2, 10]        38,400     
 BatchNorm-44    [[5, 480, 2, 10]]     [5, 480, 2, 10]         1,920     
ConvBNLayer-44    [[5, 80, 2, 10]]     [5, 480, 2, 10]           0       
   Conv2D-45     [[5, 480, 2, 10]]     [5, 480, 2, 10]        12,000     
 BatchNorm-45    [[5, 480, 2, 10]]     [5, 480, 2, 10]         1,920     
ConvBNLayer-45   [[5, 480, 2, 10]]     [5, 480, 2, 10]           0       
   Conv2D-46     [[5, 480, 2, 10]]      [5, 80, 2, 10]        38,400     
 BatchNorm-46     [[5, 80, 2, 10]]      [5, 80, 2, 10]          320      
ConvBNLayer-46   [[5, 480, 2, 10]]      [5, 80, 2, 10]           0       
ResidualUnit-15   [[5, 80, 2, 10]]      [5, 80, 2, 10]           0       
   Conv2D-47      [[5, 80, 2, 10]]     [5, 480, 2, 10]        38,400     
 BatchNorm-47    [[5, 480, 2, 10]]     [5, 480, 2, 10]         1,920     
ConvBNLayer-47    [[5, 80, 2, 10]]     [5, 480, 2, 10]           0       stage3结束
===========================================================================
Total params: 410,848
Trainable params: 398,480
Non-trainable params: 12,368
---------------------------------------------------------------------------
Input size (MB): 1.17
Forward/backward pass size (MB): 116.78
Params size (MB): 1.57
Estimated Total Size (MB): 119.52
---------------------------------------------------------------------------

可以对照骨干网结构图,看懂各项输出,各阶段已经在输出中标注。由于yml文件中设置disable_se为True,即禁用SE模块,所以压发层实际上并没有用到主干网模型中。如果把disable_se改为False,可以在输出中看到SEModule-1到SEModule-8,有兴趣可以改代码测试。

小结

首先解释了卷归层ConvBNLayer、压发层SEModule、残差层ResidualUnit三个基本概念,接着分析了MobileNetV3的内部结构,最后通过python代码展示PaddleOCRv3文本检测神经网络的summary输出。测试代码可以参考gitee


原文地址:https://blog.csdn.net/Eric_Hxy/article/details/144216159

免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!