基于分离式标签协同学习的YOLOv5多属性分类

doi:10.11772/j.issn.1001-9081.2023050675

《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (5): 1619-1628.DOI: 10.11772/j.issn.1001-9081.2023050675

所属专题：多媒体计算与计算机仿真

• 多媒体计算与计算机仿真 • 上一篇下一篇

基于分离式标签协同学习的YOLOv5多属性分类

李鑫, 孟乔(), 皇甫俊逸, 孟令辰

青海大学计算机技术与应用系，西宁 810016

收稿日期:2023-06-01 修回日期:2023-09-17 接受日期:2023-10-11 发布日期:2023-10-17 出版日期:2024-05-10
通讯作者: 孟乔
作者简介:李鑫（1995—），男，四川南充人，硕士研究生，主要研究方向：智能交通、计算机视觉
皇甫俊逸（1998—），男，江西上饶人，硕士，主要研究方向：图像处理、视频分析
孟令辰（1999—），男，河南南阳人，硕士研究生，主要研究方向：智能交通。
第一联系人：孟乔（1983—），女，陕西咸阳人，讲师，博士，CCF会员，主要研究方向：智能交通、信息系统工程
基金资助:
青海省自然科学基金资助项目(2023?ZJ?989Q)

YOLOv5 multi-attribute classification based on separable label collaborative learning

Xin LI, Qiao MENG(), Junyi HUANGFU, Lingchen MENG

Department of Computer Technology and Applications，Qinghai University，Xining Qinghai 810016，China

Received:2023-06-01 Revised:2023-09-17 Accepted:2023-10-11 Online:2023-10-17 Published:2024-05-10
Contact: Qiao MENG
About author:LI Xin， born in 1995， M. S. candidate. His research interests include intelligent transportation， computer vision.
HUANGFU Junyi， born in 1998， M. S. His research interests include image processing， video analysis.
MENG Lingchen， born in 1999， M. S. candidate. His research interests include intelligent transportation.
Supported by:
Natural Science Foundation of Qinghai Province(2023-ZJ-989Q)

摘要/Abstract

摘要：

针对图像分类任务中卷积网络提取图像细粒度特征能力不足、多属性之间的依赖关系无法识别的问题，提出一种基于YOLOv5的车辆多属性分类方法Multi-YOLOv5。该方法设计了多头非极大值抑制（Multi-NMS）和分离式标签损失（Separate-Loss）函数协同工作机制实现车辆的多属性分类任务，并采用卷积块注意力模块（CBAM）、SA（Shuffle Attention）和CoordConv方法重构了YOLOv5检测模型，分别从提升多属性特征能力提取、增强不同属性之间的关联关系、增强网络对位置信息的感知能力三方面提升模型对目标多属性分类的精准性。在VeRi等数据集上进行了训练与测试，实验结果表明，与基于GoogLeNet、残差网络（ResNet）、EfficientNet、ViT（Vision Transformer）等的网络结构相比，Multi-YOLOv5方法在目标的多属性分类方面取得了较好的识别结果，在VeRi数据集上，它的平均精度均值（mAP）达到了87.37%，较上述表现最佳的方法提高了4.47个百分点，且比原YOLOv5模型具有更好的鲁棒性，能为密集环境下的交通目标感知提供可靠的数据信息。

关键词: 多属性分类, 深度学习, 多特征融合, 注意力, YOLOv5

Abstract:

An Multi-YOLOv5 method was proposed for vehicle multi-attribute classification based on YOLOv5 to address the challenges of insufficient ability of convolutional networks to extract fine-grained features of images and inability to recognize dependencies between multiple attributes in image classification tasks. A collaborative working mechanism of Multi-head Non-Maximum Suppression （Multi-NMS） and separable label loss （Separate-Loss） function was designed to complete the multi-attribute classification task of vehicles. Additionally， the YOLOv5 detection model was reconstructed by using Convolutional Block Attention Module （CBAM）， Shuffle Attention （SA）， and CoordConv methods to enhance the ability of extracting multi-attribute features， strengthen the correlation between different attributes， and enhance the network’s perception of positional information， thereby improving the accuracy of the model in multi-attribute classification of objects. Finally， training and testing were conducted on datasets such as VeRi. Experimental results demonstrate that the Multi-YOLOv5 approach achieves superior recognition outcomes in multi-attribute classification of objects compared to network architectures including GoogLeNet， Residual Network （ResNet）， EfficientNet， and Vision Transformer （ViT）. The mean Average Precision （mAP） of Multi-YOLOv5 reaches 87.37% on VeRi dataset， with a remarkable improvement of 4.47 percentage points over the best-performing method mentioned above. Moreover， Multi-YOLOv5 exhibits better robustness compared to the original YOLOv5 model， thus providing reliable data information for traffic object perception in dense environments.

Key words: multi-attribute classification, deep learning, multi-feature fusion, attention, YOLOv5

中图分类号:

TP391.41

李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 计算机应用, 2024, 44(5): 1619-1628.

Xin LI, Qiao MENG, Junyi HUANGFU, Lingchen MENG. YOLOv5 multi-attribute classification based on separable label collaborative learning[J]. Journal of Computer Applications, 2024, 44(5): 1619-1628.

图/表 17

图1 YOLOv5网络结构

Fig. 1 Network structure of YOLOv5

图2 卷积块注意力模块

Fig. 2 Convolutional block attention module

图3 基本卷积块替换

Fig. 3 Basic convolutional block replacement

图4 SA模块结构

Fig. 4 Structure of SA model

图5 分离式多属性损失标签分离方式

Fig. 5 Label separation mode for separable multi-attribute loss

图6 多头非极大值抑制原理

Fig. 6 Principle of multi-head non-maximum suppression

图7 VeRi数据集示例

Fig. 7 Examples of VeRi dataset

图8 VRID数据集示例

Fig. 8 Examples of VRID dataset

图9 标签压缩过程

Fig. 9 Process of label compression

图10 Mosaic数据增强结果

Fig. 10 Data augmentation results by using Mosaic

表1 本文方法与原方法的mAP对比

Tab. 1 Comparison of mAP between proposed method with baseline methods

方法	mAP/%
方法	Color	Type	Color+Type
YOLOv5-Color	91.09	—	76.84
YOLOv5-Type	—	84.32	76.84
本文方法	—	—	84.75

表2 VeRi数据集上消融实验结果

Tab. 2 Experimental results of ablation study on VeRi dataset

YOLOv5s	CBAM	SA	CoordConv	mAP/%
√	×	×	×	84.75
√	√	×	×	85.32
√	√	√	×	86.75
√	√	√	√	87.37

图11 VeRi数据集上各类别检测PR曲线

Fig. 11 PR curves for detection of various classes in VeRi dataset

图12 不同参数下检测结果的可视化热力图

Fig. 12 Visualization of heat maps of detection results for different parameters

表3 VeRi和VRID数据上对比实验结果

Tab. 3 Comparative experiment results on VeRi and VRID datasets

方法	VeRi			VRID
方法	mAP/%	F1/%	FPS	mAP/%	F1/%	FPS
GoogLeNet	77.50	77.80	80.48	97.50	92.34	104.19
ResNet-34	82.10	78.33	78.15	97.00	91.78	104.62
ResNet-101	82.90	75.79	52.72	97.10	91.55	57.78
EfficientNet-B0	79.50	77.33	84.92	88.00	82.08	112.71
ViT-Base	71.20	68.82	81.77	62.80	56.23	107.99
本文方法	87.37	84.19	87.53	97.91	93.39	108.69

图13 分类结果可视化

Fig. 13 Visualization of classification results

图14 分类效果对比

Fig. 14 Comparison of classification effects

参考文献 33

1	CORTES C， VAPNIK V. Support-vector networks［J］. Machine Learning， 1995， 20： 273-297. 10.1007/bf00994018
2	COVER T， HART P. Nearest neighbor pattern classification［J］. IEEE Transactions on Information Theory， 1967， 13（1）： 21-27. 10.1109/tit.1967.1053964
3	QUINLAN J R. Induction of decision trees［J］. Machine Learning， 1986， 1： 81-106. 10.1007/bf00116251
4	SZEGEDY C， LIU W， JIA Y， et al. Going deeper with convolutions ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1-9. 10.1109/cvpr.2015.7298594
5	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
6	TAN M， LE Q. Efficientnet： rethinking model scaling for convolutional neural networks ［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR， 2019： 6105-6114.
7	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale［EB/OL］. ［2023-09-10］. .
8	任炜，白鹤翔.基于全局与局部标签关系的多标签图像分类方法［J］. 计算机应用，2022，42（5）：1383-1390.
	REN W， BAI H X. Multi-label image classification method based on global and local label relationship［J］. Journal of Computer Applications， 2022， 42（5）： 1383-1390.
9	KIM H-C， J-H PARK， KIM D-W， et al. Multilabel naïve Bayes classification considering label dependence［J］. Pattern Recognition Letters， 2020， 136： 279-285. 10.1016/j.patrec.2020.06.021
10	牟甲鹏，蔡剑，余孟池，等. 基于标签相关性的类属属性多标签分类算法［J］. 计算机应用研究， 2020， 37（9）： 2656-2658， 2673. 10.19734/j.issn.1001-3695.2019.04.0118
	MU J P， CAI J， YU M C， et al. Label-correlation based multi-label classification algorithm with label-specific features ［J］. Application Research Computers， 2020， 37（9）： 2656-2658， 2673. 10.19734/j.issn.1001-3695.2019.04.0118
11	CHEN Z-M， WEI X-S， WANG P， et al. Multi-label image recognition with graph convolutional networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5177-5186. 10.1109/cvpr.2019.00532
12	白尚旺，王梦瑶，胡静，等.多区域注意力的细粒度图像分类网络［J］.计算机工程，2024，50（1）：271-278.
	BAI S W， WANG M Y， HU J， et al. Multi-region attention for fine-grained image classification［J］. Computer Engineering， 2024， 50（1）： 271-278.
13	高红民，朱敏，曹雪莹，等.多尺度融合注意力机制的胆囊癌显微高光谱图像分类［J］. 中国图象图形学报，2023，28（4）：1173-1185. 10.11834/jig.211201
	GAO H M， ZHU M， CAO X Y， et al. A micro-hyperspectral image classification method of gallbladder cancer based on multi-scale fusion attention mechanism［J］. Journal of Image and Graphics， 2023， 28（4）： 1173-1185. 10.11834/jig.211201
14	LIU X， LIU W， MEI T， et al. PROVID： progressive and multimodal vehicle reidentification for large-scale urban surveillance［J］. IEEE Transactions on Multimedia， 2018， 20（3）： 645-658. 10.1109/tmm.2017.2751966
15	LIU X， LIU W， MEI T， et al. A deep learning-based approach to progressive vehicle re-identification for urban surveillance ［C］// Proceedings of the 14th European Conference on Computer Vision. Cham： Springer， 2016： 869-884. 10.1007/978-3-319-46475-6_53
16	LIU X， LIU W， MA H， et al. Large-scale vehicle re-identification in urban surveillance videos ［C］// Proceedings of the 2016 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2016： 1-6. 10.1109/icme.2016.7553002
17	LI X， YUAN M， JIANG Q， et al. VRID-1： a basic vehicle re‑identification dataset for similar vehicles ［C］// Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems. Piscataway： IEEE， 2017： 1-8. 10.1109/itsc.2017.8317817
18	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
19	REDMON J， FARHADI A. YOLO9000： better， faster， stronger ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
20	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. ［2023-05-14］. . 10.1109/cvpr.2017.690
21	BOCHKOVSKIY A， WANG C-Y， LIAO H-Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-05-14］. .
22	陈帅，袁宇浩.改进YOLOv5的手语字母识别算法研究［J］.小型微型计算机系统，2023，44（4）：838-844.
	CHEN S， YUAN Y H. Study of improved YOLOv5 algorithms for sign language letter recognition［J］. Journal of Chinese Computer Systems， 2023， 44（4）： 838-844.
23	HE K， ZHANG X， REN S， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
24	ELFWING S， UCHIBE E， DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning［J］. Neural Networks， 2018， 107： 3-11. 10.1016/j.neunet.2017.12.012
25	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2117-2125. 10.1109/cvpr.2017.106
26	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
27	ZHENG Z， WANG P， LIU W， et al. Distance-IoU loss： faster and better learning for bounding box regression ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020， 34（7）： 12993-13000. 10.1609/aaai.v34i07.6999
28	WOO S， PARK J， LEE J-Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01234-2_1
29	LIU R， LEHMAN J， MOLINO P， et al. An intriguing failing of convolutional neural networks and the CoordConv solution ［C］// Proceeding of the 32nd Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 9628-9639. 10.1109/icinpro43533.2018.9096860
30	IOFFE S， SZEGEDY C. Batch normalization： accelerating deep network training by reducing internal covariate shift ［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR， 2015： 448-456.
31	ZHANG Q-L， YANG Y-B. SA-Net： shuffle attention for deep convolutional neural networks ［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 2235-2239. 10.1109/icassp39728.2021.9414568
32	WU Y， HE K. Group normalization ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01261-8_1
33	河湟杯数据湖算法大赛. 车辆多属性识别赛道［EB/OL］. ［2023-08-23］. .
	Hehuang Cup Data Lake Algorithm Competition. Vehicle multi-attribute recognition track［EB/OL］. ［2023-08-23］. .

[1]	潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877.
[2]	赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892.
[3]	李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703.
[4]	任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746.
[5]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[6]	王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918.
[7]	李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738.
[8]	杨航, 李汪根, 张根生, 王志格, 开新. 基于图神经网络的多层信息交互融合算法用于会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2719-2725.
[9]	黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969.
[10]	李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587.
[11]	薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392.
[12]	汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399.
[13]	高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406.
[14]	陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413.
[15]	汪才钦, 周渝皓, 张顺香, 王琰慧, 王小龙. 基于语境增强的新能源汽车投诉文本方面-观点对抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2430-2436.

基于分离式标签协同学习的YOLOv5多属性分类

YOLOv5 multi-attribute classification based on separable label collaborative learning

RichHTML

PDF

可视化

摘要/Abstract

引用本文

使用本文

图/表 17

参考文献 33

相关文章 15

编辑推荐

Metrics