YOLOv5 multi-attribute classification based on separable label collaborative learning

doi:10.11772/j.issn.1001-9081.2023050675

Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (5): 1619-1628.DOI: 10.11772/j.issn.1001-9081.2023050675

Special Issue: 多媒体计算与计算机仿真

• Multimedia computing and computer simulation • Previous Articles Next Articles

YOLOv5 multi-attribute classification based on separable label collaborative learning

Xin LI, Qiao MENG(), Junyi HUANGFU, Lingchen MENG

Department of Computer Technology and Applications，Qinghai University，Xining Qinghai 810016，China

Received:2023-06-01 Revised:2023-09-17 Accepted:2023-10-11 Online:2023-10-17 Published:2024-05-10
Contact: Qiao MENG
About author:LI Xin， born in 1995， M. S. candidate. His research interests include intelligent transportation， computer vision.
HUANGFU Junyi， born in 1998， M. S. His research interests include image processing， video analysis.
MENG Lingchen， born in 1999， M. S. candidate. His research interests include intelligent transportation.
Supported by:
Natural Science Foundation of Qinghai Province(2023-ZJ-989Q)

基于分离式标签协同学习的YOLOv5多属性分类

李鑫, 孟乔(), 皇甫俊逸, 孟令辰

青海大学计算机技术与应用系，西宁 810016

通讯作者: 孟乔
作者简介:李鑫（1995—），男，四川南充人，硕士研究生，主要研究方向：智能交通、计算机视觉
皇甫俊逸（1998—），男，江西上饶人，硕士，主要研究方向：图像处理、视频分析
孟令辰（1999—），男，河南南阳人，硕士研究生，主要研究方向：智能交通。
第一联系人：孟乔（1983—），女，陕西咸阳人，讲师，博士，CCF会员，主要研究方向：智能交通、信息系统工程
基金资助:
青海省自然科学基金资助项目(2023?ZJ?989Q)

Abstract

Abstract:

An Multi-YOLOv5 method was proposed for vehicle multi-attribute classification based on YOLOv5 to address the challenges of insufficient ability of convolutional networks to extract fine-grained features of images and inability to recognize dependencies between multiple attributes in image classification tasks. A collaborative working mechanism of Multi-head Non-Maximum Suppression （Multi-NMS） and separable label loss （Separate-Loss） function was designed to complete the multi-attribute classification task of vehicles. Additionally， the YOLOv5 detection model was reconstructed by using Convolutional Block Attention Module （CBAM）， Shuffle Attention （SA）， and CoordConv methods to enhance the ability of extracting multi-attribute features， strengthen the correlation between different attributes， and enhance the network’s perception of positional information， thereby improving the accuracy of the model in multi-attribute classification of objects. Finally， training and testing were conducted on datasets such as VeRi. Experimental results demonstrate that the Multi-YOLOv5 approach achieves superior recognition outcomes in multi-attribute classification of objects compared to network architectures including GoogLeNet， Residual Network （ResNet）， EfficientNet， and Vision Transformer （ViT）. The mean Average Precision （mAP） of Multi-YOLOv5 reaches 87.37% on VeRi dataset， with a remarkable improvement of 4.47 percentage points over the best-performing method mentioned above. Moreover， Multi-YOLOv5 exhibits better robustness compared to the original YOLOv5 model， thus providing reliable data information for traffic object perception in dense environments.

Key words: multi-attribute classification, deep learning, multi-feature fusion, attention, YOLOv5

摘要：

针对图像分类任务中卷积网络提取图像细粒度特征能力不足、多属性之间的依赖关系无法识别的问题，提出一种基于YOLOv5的车辆多属性分类方法Multi-YOLOv5。该方法设计了多头非极大值抑制（Multi-NMS）和分离式标签损失（Separate-Loss）函数协同工作机制实现车辆的多属性分类任务，并采用卷积块注意力模块（CBAM）、SA（Shuffle Attention）和CoordConv方法重构了YOLOv5检测模型，分别从提升多属性特征能力提取、增强不同属性之间的关联关系、增强网络对位置信息的感知能力三方面提升模型对目标多属性分类的精准性。在VeRi等数据集上进行了训练与测试，实验结果表明，与基于GoogLeNet、残差网络（ResNet）、EfficientNet、ViT（Vision Transformer）等的网络结构相比，Multi-YOLOv5方法在目标的多属性分类方面取得了较好的识别结果，在VeRi数据集上，它的平均精度均值（mAP）达到了87.37%，较上述表现最佳的方法提高了4.47个百分点，且比原YOLOv5模型具有更好的鲁棒性，能为密集环境下的交通目标感知提供可靠的数据信息。

关键词: 多属性分类, 深度学习, 多特征融合, 注意力, YOLOv5

CLC Number:

TP391.41

Xin LI, Qiao MENG, Junyi HUANGFU, Lingchen MENG. YOLOv5 multi-attribute classification based on separable label collaborative learning[J]. Journal of Computer Applications, 2024, 44(5): 1619-1628.

李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628.

Figures/Tables 17

Fig. 1 Network structure of YOLOv5

Fig. 2 Convolutional block attention module

Fig. 3 Basic convolutional block replacement

Fig. 4 Structure of SA model

Fig. 5 Label separation mode for separable multi-attribute loss

Fig. 6 Principle of multi-head non-maximum suppression

Fig. 7 Examples of VeRi dataset

Fig. 8 Examples of VRID dataset

Fig. 9 Process of label compression

Fig. 10 Data augmentation results by using Mosaic

Tab. 1 Comparison of mAP between proposed method with baseline methods

方法	mAP/%
方法	Color	Type	Color+Type
YOLOv5-Color	91.09	—	76.84
YOLOv5-Type	—	84.32	76.84
本文方法	—	—	84.75

Tab. 2 Experimental results of ablation study on VeRi dataset

YOLOv5s	CBAM	SA	CoordConv	mAP/%
√	×	×	×	84.75
√	√	×	×	85.32
√	√	√	×	86.75
√	√	√	√	87.37

Fig. 11 PR curves for detection of various classes in VeRi dataset

Fig. 12 Visualization of heat maps of detection results for different parameters

Tab. 3 Comparative experiment results on VeRi and VRID datasets

方法	VeRi			VRID
方法	mAP/%	F1/%	FPS	mAP/%	F1/%	FPS
GoogLeNet	77.50	77.80	80.48	97.50	92.34	104.19
ResNet-34	82.10	78.33	78.15	97.00	91.78	104.62
ResNet-101	82.90	75.79	52.72	97.10	91.55	57.78
EfficientNet-B0	79.50	77.33	84.92	88.00	82.08	112.71
ViT-Base	71.20	68.82	81.77	62.80	56.23	107.99
本文方法	87.37	84.19	87.53	97.91	93.39	108.69

Fig. 13 Visualization of classification results

Fig. 14 Comparison of classification effects

References 33

1	CORTES C， VAPNIK V. Support-vector networks［J］. Machine Learning， 1995， 20： 273-297. 10.1007/bf00994018
2	COVER T， HART P. Nearest neighbor pattern classification［J］. IEEE Transactions on Information Theory， 1967， 13（1）： 21-27. 10.1109/tit.1967.1053964
3	QUINLAN J R. Induction of decision trees［J］. Machine Learning， 1986， 1： 81-106. 10.1007/bf00116251
4	SZEGEDY C， LIU W， JIA Y， et al. Going deeper with convolutions ［C］// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2015： 1-9. 10.1109/cvpr.2015.7298594
5	HE K， ZHANG X， REN S， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
6	TAN M， LE Q. Efficientnet： rethinking model scaling for convolutional neural networks ［C］// Proceedings of the 36th International Conference on Machine Learning. New York： JMLR， 2019： 6105-6114.
7	DOSOVITSKIY A， BEYER L， KOLESNIKOV A， et al. An image is worth 16×16 words： Transformers for image recognition at scale［EB/OL］. ［2023-09-10］. .
8	任炜，白鹤翔.基于全局与局部标签关系的多标签图像分类方法［J］. 计算机应用，2022，42（5）：1383-1390.
	REN W， BAI H X. Multi-label image classification method based on global and local label relationship［J］. Journal of Computer Applications， 2022， 42（5）： 1383-1390.
9	KIM H-C， J-H PARK， KIM D-W， et al. Multilabel naïve Bayes classification considering label dependence［J］. Pattern Recognition Letters， 2020， 136： 279-285. 10.1016/j.patrec.2020.06.021
10	牟甲鹏，蔡剑，余孟池，等. 基于标签相关性的类属属性多标签分类算法［J］. 计算机应用研究， 2020， 37（9）： 2656-2658， 2673. 10.19734/j.issn.1001-3695.2019.04.0118
	MU J P， CAI J， YU M C， et al. Label-correlation based multi-label classification algorithm with label-specific features ［J］. Application Research Computers， 2020， 37（9）： 2656-2658， 2673. 10.19734/j.issn.1001-3695.2019.04.0118
11	CHEN Z-M， WEI X-S， WANG P， et al. Multi-label image recognition with graph convolutional networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5177-5186. 10.1109/cvpr.2019.00532
12	白尚旺，王梦瑶，胡静，等.多区域注意力的细粒度图像分类网络［J］.计算机工程，2024，50（1）：271-278.
	BAI S W， WANG M Y， HU J， et al. Multi-region attention for fine-grained image classification［J］. Computer Engineering， 2024， 50（1）： 271-278.
13	高红民，朱敏，曹雪莹，等.多尺度融合注意力机制的胆囊癌显微高光谱图像分类［J］. 中国图象图形学报，2023，28（4）：1173-1185. 10.11834/jig.211201
	GAO H M， ZHU M， CAO X Y， et al. A micro-hyperspectral image classification method of gallbladder cancer based on multi-scale fusion attention mechanism［J］. Journal of Image and Graphics， 2023， 28（4）： 1173-1185. 10.11834/jig.211201
14	LIU X， LIU W， MEI T， et al. PROVID： progressive and multimodal vehicle reidentification for large-scale urban surveillance［J］. IEEE Transactions on Multimedia， 2018， 20（3）： 645-658. 10.1109/tmm.2017.2751966
15	LIU X， LIU W， MEI T， et al. A deep learning-based approach to progressive vehicle re-identification for urban surveillance ［C］// Proceedings of the 14th European Conference on Computer Vision. Cham： Springer， 2016： 869-884. 10.1007/978-3-319-46475-6_53
16	LIU X， LIU W， MA H， et al. Large-scale vehicle re-identification in urban surveillance videos ［C］// Proceedings of the 2016 IEEE International Conference on Multimedia and Expo. Piscataway： IEEE， 2016： 1-6. 10.1109/icme.2016.7553002
17	LI X， YUAN M， JIANG Q， et al. VRID-1： a basic vehicle re‑identification dataset for similar vehicles ［C］// Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems. Piscataway： IEEE， 2017： 1-8. 10.1109/itsc.2017.8317817
18	REDMON J， DIVVALA S， GIRSHICK R， et al. You only look once： unified， real-time object detection ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 779-788. 10.1109/cvpr.2016.91
19	REDMON J， FARHADI A. YOLO9000： better， faster， stronger ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 6517-6525. 10.1109/cvpr.2017.690
20	REDMON J， FARHADI A. YOLOv3： an incremental improvement［EB/OL］. ［2023-05-14］. . 10.1109/cvpr.2017.690
21	BOCHKOVSKIY A， WANG C-Y， LIAO H-Y M. YOLOv4： optimal speed and accuracy of object detection［EB/OL］. ［2023-05-14］. .
22	陈帅，袁宇浩.改进YOLOv5的手语字母识别算法研究［J］.小型微型计算机系统，2023，44（4）：838-844.
	CHEN S， YUAN Y H. Study of improved YOLOv5 algorithms for sign language letter recognition［J］. Journal of Chinese Computer Systems， 2023， 44（4）： 838-844.
23	HE K， ZHANG X， REN S， et al. Spatial pyramid pooling in deep convolutional networks for visual recognition［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2015， 37（9）： 1904-1916. 10.1109/tpami.2015.2389824
24	ELFWING S， UCHIBE E， DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning［J］. Neural Networks， 2018， 107： 3-11. 10.1016/j.neunet.2017.12.012
25	LIN T-Y， DOLLÁR P， GIRSHICK R， et al. Feature pyramid networks for object detection ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2117-2125. 10.1109/cvpr.2017.106
26	LIU S， QI L， QIN H， et al. Path aggregation network for instance segmentation ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 8759-8768. 10.1109/cvpr.2018.00913
27	ZHENG Z， WANG P， LIU W， et al. Distance-IoU loss： faster and better learning for bounding box regression ［J］. Proceedings of the AAAI Conference on Artificial Intelligence， 2020， 34（7）： 12993-13000. 10.1609/aaai.v34i07.6999
28	WOO S， PARK J， LEE J-Y， et al. CBAM： convolutional block attention module ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01234-2_1
29	LIU R， LEHMAN J， MOLINO P， et al. An intriguing failing of convolutional neural networks and the CoordConv solution ［C］// Proceeding of the 32nd Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2018： 9628-9639. 10.1109/icinpro43533.2018.9096860
30	IOFFE S， SZEGEDY C. Batch normalization： accelerating deep network training by reducing internal covariate shift ［C］// Proceedings of the 32nd International Conference on Machine Learning. New York： JMLR， 2015： 448-456.
31	ZHANG Q-L， YANG Y-B. SA-Net： shuffle attention for deep convolutional neural networks ［C］// Proceedings of the 2021 IEEE International Conference on Acoustics， Speech and Signal Processing. Piscataway： IEEE， 2021： 2235-2239. 10.1109/icassp39728.2021.9414568
32	WU Y， HE K. Group normalization ［C］// Proceedings of the 15th European Conference on Computer Vision. Cham： Springer， 2018： 3-19. 10.1007/978-3-030-01261-8_1
33	河湟杯数据湖算法大赛. 车辆多属性识别赛道［EB/OL］. ［2023-08-23］. .
	Hehuang Cup Data Lake Algorithm Competition. Vehicle multi-attribute recognition track［EB/OL］. ［2023-08-23］. .

[1]	Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974.
[2]	Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918.
[3]	Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738.
[4]	Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725.
[5]	Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703.
[6]	Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746.
[7]	Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969.
[8]	Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877.
[9]	Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892.
[10]	Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587.
[11]	Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392.
[12]	Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU. Sentiment classification model of psychological counseling text based on attention over attention mechanism [J]. Journal of Computer Applications, 2024, 44(8): 2393-2399.
[13]	Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406.
[14]	Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413.
[15]	Caiqin WANG, Yuhao ZHOU, Shunxiang ZHANG, Yanhui WANG, Xiaolong WANG. Aspect-opinion pair extraction of new energy vehicle complaint text based on context enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2430-2436.

YOLOv5 multi-attribute classification based on separable label collaborative learning

基于分离式标签协同学习的YOLOv5多属性分类

RichHTML

PDF

Knowledge

Abstract

Cite this article

share this article

Figures/Tables 17

References 33

Related Articles 15

Recommended Articles

Metrics