YOLOv11改进,YOLOv11结合DynamicConv(动态卷积),CVPR2024,二次创新C3k2结构
摘要
大规模视觉预训练显著提高了大规模视觉模型的性能。现有的低 FLOPs 模型无法从大规模预训练中受益。在本文中,作者提出了一种新的设计原则,称为 ParameterNet,旨在通过最小化FLOPs的增加来增加大规模视觉预训练模型中的参数数量。利用 DynamicConv 动态卷积将额外的参数加入到网络中,而几乎不增加FLOPs。ParameterNet 方法使低 FLOPs 网络能够受益于大规模视觉预训练。
# 理论介绍
DynamicConv(动态卷积)是一个用于提高卷积神经网络(CNN)性能的技术,核心思想是动态地生成卷积核(filter),而不是使用固定的卷积核。通过引入更多的计算灵活性和适应性来增强卷积操作的表达能力,进而提升模型的性能。工作方式:
- 专家选择:DynamicConv 通过引入多个“专家”(experts),每个专家学习特定的卷积模式。输入图像的不同部分会选择不同的专家进行卷积。
- 动态卷积核生成:根据输入的不同特征,专家网络动态地生成卷积核,而不是使用固定的卷积核。这意味着卷积操作是基于输入特征动态调整的,具备更多的灵活性和表达能力。
- 混合卷积:在某些情况下,可以将多个专家的卷积结果进行加权融合,形成最终的卷积输出。这种加权融合方式根据任务的不同可以进行调整。
DynamicConv 的核心优势在于它能够根据输入的特征动态生成适应性强的卷积核,从而提升模型的表达能力和灵活性。
理论详解可以参考链接:论文地址
代码可在这个链接找到:代码地址
下文都是手把手教程,跟着操作即可添加成功
下文都是手把手教程,跟着操作即可添加成功
目录
🎓一、YOLOv11原始版本代码下载
官网的源码下载地址 :官网源码
官网打不开的话,从我的网盘下载就行,网盘下载地址: YOLOv11原始版本源码下载,版本为ultralytics-8.3.6,提取码: ehhs
注意注意注意:如果在我之前的文章下载过 YOLOv11 源码,不用重新下载了,没有特殊说明都是用同一个版本的源码
🍀🍀1.YOLOv11模型结构图
根据 yolov11.yaml 画出 yolo 整体结构图,如下图所示
🍀🍀2.环境配置
环境配置参考教程链接:链接: 环境配置链接,如果已经配置好环境可以忽略此步骤
🎓二、DynamicConv代码
# -*- coding: utf-8 -*-
"""
@Auth :挂科边缘
@File :DynamicConv.py
@IDE :PyCharm
@Motto :学习新思想,争做新青年
"""
from timm.layers import SqueezeExcite, drop_path
from ultralytics.nn.modules.block import Bottleneck, C2f, C3
"""
An implementation of GhostNet Model as defined in:
GhostNet: More Features from Cheap Operations. https://arxiv.org/abs/1911.11907
The train script of the model is similar to that of MobileNetV3
Original model: https://github.com/huawei-noah/CV-backbones/tree/master/ghostnet_pytorch
"""
import math
from functools import partial
import torch
import torch.nn as nn
import torch.nn.functional as F
from timm.data import IMAGENET_DEFAULT_MEAN, IMAGENET_DEFAULT_STD
from timm.models.layers import SelectAdaptivePool2d, Linear, CondConv2d, hard_sigmoid, make_divisible, DropPath
from timm.models.helpers import build_model_with_cfg
from timm.models.registry import register_model
def _cfg(url='', **kwargs):
return {
'url': url, 'num_classes': 1000, 'input_size': (3, 224, 224), 'pool_size': (1, 1),
'crop_pct': 0.875, 'interpolation': 'bilinear',
'mean': IMAGENET_DEFAULT_MEAN, 'std': IMAGENET_DEFAULT_STD,
'first_conv': 'conv_stem', 'classifier': 'classifier',
**kwargs
}
default_cfgs = {
'ghostnet_100': _cfg(
url='https://github.com/huawei-noah/CV-backbones/releases/download/ghostnet_pth/ghostnet_1x.pth'),
'ghostnet': _cfg(url=''),
}
_SE_LAYER = partial(SqueezeExcite, gate_fn=hard_sigmoid, divisor=4)
class DynamicConv(nn.Module):
""" Dynamic Conv layer
"""
def __init__(self, in_features, out_features, kernel_size=1, stride=1, padding='', dilation=1,
groups=1, bias=False, num_experts=4):
super().__init__()
self.routing = nn.Linear(in_features, num_experts)
self.cond_conv = CondConv2d(in_features, out_features, kernel_size, stride, padding, dilation,
groups, bias, num_experts)
def forward(self, x):
pooled_inputs = F.adaptive_avg_pool2d(x, 1).flatten(1) # CondConv routing
routing_weights = torch.sigmoid(self.routing(pooled_inputs))
x = self.cond_conv(x, routing_weights)
return x
class ConvBnAct(nn.Module):
""" Conv + Norm Layer + Activation w/ optional skip connection
"""
def __init__(
self, in_chs, out_chs, kernel_size, stride=1, dilation=1, pad_type='',
skip=False, act_layer=nn.ReLU, norm_layer=nn.BatchNorm2d, drop_path_rate=0., num_experts=4):
super(ConvBnAct, self).__init__()
self.has_residual = skip and stride == 1 and in_chs == out_chs
self.drop_path_rate = drop_path_rate
# self.conv = create_conv2d(in_chs, out_chs, kernel_size, stride=stride, dilation=dilation, padding=pad_type)
self.conv = DynamicConv(in_chs, out_chs, kernel_size, stride, dilation=dilation, padding=pad_type,
num_experts=num_experts)
self.bn1 = norm_layer(out_chs)
self.act1 = act_layer()
def feature_info(self, location):
if location == 'expansion': # output of conv after act, same as block coutput
info = dict(module='act1', hook_type='forward', num_chs=self.conv.out_channels)
else: # location == 'bottleneck', block output
info = dict(module='', hook_type='', num_chs=self.conv.out_channels)
return info
def forward(self, x):
shortcut = x
x = self.conv(x)
x = self.bn1(x)
x = self.act1(x)
if self.has_residual:
if self.drop_path_rate > 0.:
x = drop_path(x, self.drop_path_rate, self.training)
x += shortcut
return x
class GhostModule(nn.Module):
def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, act_layer=nn.ReLU, num_experts=4):
super(GhostModule, self).__init__()
self.oup = oup
init_channels = math.ceil(oup / ratio)
new_channels = init_channels * (ratio - 1)
self.primary_conv = nn.Sequential(
DynamicConv(inp, init_channels, kernel_size, stride, kernel_size // 2, bias=False, num_experts=num_experts),
nn.BatchNorm2d(init_channels),
act_layer() if act_layer is not None else nn.Sequential(),
)
self.cheap_operation = nn.Sequential(
DynamicConv(init_channels, new_channels, dw_size, 1, dw_size // 2, groups=init_channels, bias=False,
num_experts=num_experts),
nn.BatchNorm2d(new_channels),
act_layer() if act_layer is not None else nn.Sequential(),
)
def forward(self, x):
x1 = self.primary_conv(x)
x2 = self.cheap_operation(x1)
out = torch.cat([x1, x2], dim=1)
return out[:, :self.oup, :, :]
class GhostBottleneck(nn.Module):
""" Ghost bottleneck w/ optional SE"""
def __init__(self, in_chs, mid_chs, out_chs, dw_kernel_size=3,
stride=1, act_layer=nn.ReLU, se_ratio=0., drop_path=0., num_experts=4):
super(GhostBottleneck, self).__init__()
has_se = se_ratio is not None and se_ratio > 0.
self.stride = stride
# Point-wise expansion
self.ghost1 = GhostModule(in_chs, mid_chs, act_layer=act_layer, num_experts=num_experts)
# Depth-wise convolution
if self.stride > 1:
self.conv_dw = nn.Conv2d(
mid_chs, mid_chs, dw_kernel_size, stride=stride,
padding=(dw_kernel_size - 1) // 2, groups=mid_chs, bias=False)
self.bn_dw = nn.BatchNorm2d(mid_chs)
else:
self.conv_dw = None
self.bn_dw = None
# Squeeze-and-excitation
self.se = _SE_LAYER(mid_chs, se_ratio=se_ratio,
act_layer=act_layer if act_layer is not nn.GELU else nn.ReLU) if has_se else None
# Point-wise linear projection
self.ghost2 = GhostModule(mid_chs, out_chs, act_layer=None, num_experts=num_experts)
# shortcut
if in_chs == out_chs and self.stride == 1:
self.shortcut = nn.Sequential()
else:
self.shortcut = nn.Sequential(
DynamicConv(
in_chs, in_chs, dw_kernel_size, stride=stride,
padding=(dw_kernel_size - 1) // 2, groups=in_chs, bias=False, num_experts=num_experts),
nn.BatchNorm2d(in_chs),
DynamicConv(in_chs, out_chs, 1, stride=1, padding=0, bias=False, num_experts=num_experts),
nn.BatchNorm2d(out_chs),
)
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
def forward(self, x):
shortcut = x
# 1st ghost bottleneck
x = self.ghost1(x)
# Depth-wise convolution
if self.conv_dw is not None:
x = self.conv_dw(x)
x = self.bn_dw(x)
# Squeeze-and-excitation
if self.se is not None:
x = self.se(x)
# 2nd ghost bottleneck
x = self.ghost2(x)
x = self.shortcut(shortcut) + self.drop_path(x)
return x
class DynamicConv_Bottleneck(Bottleneck):
def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5): # ch_in, ch_out, shortcut, groups, kernels, expand
super().__init__(c1, c2, shortcut, g, k, e)
c_ = int(c2 * e) # hidden channels
self.cv1 = DynamicConv(c1, c_)
self.cv2 = DynamicConv(c_, c2)
self.add = shortcut and c1 == c2
def forward(self, x):
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
class C3k2_GhostModule(C2f):
"""Faster Implementation of CSP Bottleneck with 2 convolutions."""
def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
"""Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
super().__init__(c1, c2, n, shortcut, g, e)
self.m = nn.ModuleList(
C3k_GhostModule(self.c, self.c, 2, shortcut, g) if c3k else GhostModule(self.c, self.c) for _ in range(n)
)
class C3k_GhostModule(C3):
"""C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
"""Initializes the C3k module with specified channels, number of layers, and configurations."""
super().__init__(c1, c2, n, shortcut, g, e)
c_ = int(c2 * e) # hidden channels
# self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
self.m = nn.Sequential(*(GhostModule(c_, c_) for _ in range(n)))
🎓三、添加方法
🍀🍀1.在modules目录下添加第二章的代码
(1).在 ultralytics/nn/modules 目录下,新建一个文件名,我这里取名为 DynamicConv.py,操作如以下截图:
(2).之后把第二章的代码复制进去就可以了
🍀🍀2.在__init__.py文件导入
文件路径为:ultralytics/nn/modules/init.py
(1)导入前,在 init.py 开头注释下面代码,这个步骤注释了以后可以跳过这个步骤
(2)之后在 init.py 开头导入该模块,导入截图所示
from .DynamicConv import C3k2_GhostModule
🍀🍀3.在tasks.py文件进行注册
(1)在 tasks.py 文件开头导入所有模块,该文件路径为:ultralytics/nn/tasks.py
在这个文件修改导入方法,并注释如图所示的代码,改成 * 号导入,以后无需手动一个个导入新的模块,方便很多,一次性导完,个步骤改完了,往后的文章就可以跳过这个步骤了,添加如截图所示
from ultralytics.nn.modules import *
(2)之后在这个文件的 parse_model 方法,添加该模块,添加如截图所示
看到这里已经成功把改进的模块添加进 YOLOv11 源码了,接下来配置 yaml 文件调用改进的模块就行了
🎓四、yaml文件修改
在 ultralytics/cfg/models/11 目录下,复制 yolo11.yaml 文件,然后取名为 yolo11-xxx.yaml,xxx 一般取改进模块名字,之后在这个文件进行修改,我这个文件代码如下所示:
🍀🍀1.第一种添加方法
yaml 全部代码如下:
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
# YOLO11n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 2, C3k2_GhostModule, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 2, C3k2_GhostModule, [512, False, 0.25]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 2, C3k2_GhostModule, [512, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 2, C3k2_GhostModule, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
- [-1, 2, C2PSA, [1024]] # 10
# YOLO11n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 2, C3k2_GhostModule, [512, False]] # 13
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 2, C3k2_GhostModule, [256, False]] # 16 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P4
- [-1, 2, C3k2_GhostModule, [512, False]] # 19 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 10], 1, Concat, [1]] # cat head P5
- [-1, 2, C3k2_GhostModule, [1024, True]] # 22 (P5/32-large)
- [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
🎓五、训练文件修改
🍀🍀1.新建训练文件
(1)在根目录新建一个 python 文件,取名为:train.py,如果之前看过我的文章,已经新建过就不用重新新建了
🍀🍀2.修改训练文件
YOLOv11 训练方式跟 YOLOv5 是有区别的,但是训练数据集格式跟 YOLOv5 一样的,你只需把处理好的数据集就行,这里就不在阐述了,废话不多说,我的训练文件如下,根据你训练需求修改指定参数就行,其中圈起来的参数需要你修改的,其他参数根据自己需求选择改或者不改就行。
训练的代码如下,如果之前看过我的文章,已经复制过了就不用重新复制了,只需修改参数就行
# -*- coding: utf-8 -*-
"""
@Auth : 挂科边缘
@File :trian.py
@IDE :PyCharm
@Motto:学习新思想,争做新青年
@Email :179958974@qq.com
"""
import warnings
warnings.filterwarnings('ignore')
from ultralytics import YOLO
if __name__ == '__main__':
# model.load('yolo11n.pt') # 加载预训练权重,改进或者做对比实验时候不建议打开,因为用预训练模型整体精度没有很明显的提升
model = YOLO(model=r'D:\2-Python\1-YOLO\YOLOv11\ultralytics-8.3.6\ultralytics\cfg\models\11\yolo11.yaml')
model.train(data=r'data.yaml',
imgsz=640,
epochs=50,
batch=4,
workers=0,
device='',
optimizer='SGD',
close_mosaic=10,
resume=False,
project='runs/train',
name='exp',
single_cls=False,
cache=False,
)
训练代码的参数解释,标蓝色的参数为常用参数:
- model 参数:该参数填入模型配置文件的路径,改进的话建议不需要预训练模型权重来训练
- data 参数:该参数可以填入训练数据集配置文件的路径
- imgsz 参数:该参数代表输入图像的尺寸,指定为 640x640 像素
- epochs 参数:该参数代表训练的轮数
- batch 参数:该参数代表批处理大小,电脑显存越大,就设置越大,根据自己电脑性能设置
- workers 参数:该参数代表数据加载的工作线程数,出现显存爆了的话可以设置为 0,默认是 8
- device 参数:该参数代表用哪个显卡训练,留空表示自动选择可用的 GPU 或 CPU
- optimizer 参数:该参数代表优化器类型
- close_mosaic 参数:该参数代表在多少个 epoch 后关闭 mosaic 数据增强
- resume 参数:该参数代表是否从上一次中断的训练状态继续训练。设置为 False 表示从头开始新的训练。如果设置为 True,则会加载上一次训练的模型权重和优化器状态,继续训练。这在训练被中断或在已有模型的基础上进行进一步训练时非常有用。
- project 参数:该参数代表项目文件夹,用于保存训练结果
- name 参数:该参数代表命名保存的结果文件夹
- single_cls 参数:该参数代表是否将所有类别视为一个类别,设置为 False 表示保留原有类别
- cache 参数:该参数代表是否缓存数据,设置为 False 表示不缓存。
测试一下训练,打印出来的 YOLOv11 结构可以看到添加改进的模块成功
总结
把环境配置好,数据集处理好,训练基本能成功,创作不易,请帮忙点一个爱心,关注我,带你不挂科!
原文地址:https://blog.csdn.net/weixin_44779079/article/details/143788689
免责声明:本站文章内容转载自网络资源,如本站内容侵犯了原著者的合法权益,可联系本站删除。更多内容请关注自学内容网(zxcms.com)!