《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (5): 1619-1628.DOI: 10.11772/j.issn.1001-9081.2023050675
• 多媒体计算与计算机仿真 • 上一篇
收稿日期:
2023-06-01
修回日期:
2023-09-17
接受日期:
2023-10-11
发布日期:
2023-10-17
出版日期:
2024-05-10
通讯作者:
孟乔
作者简介:
李鑫(1995—),男,四川南充人,硕士研究生,主要研究方向:智能交通、计算机视觉基金资助:
Xin LI, Qiao MENG(), Junyi HUANGFU, Lingchen MENG
Received:
2023-06-01
Revised:
2023-09-17
Accepted:
2023-10-11
Online:
2023-10-17
Published:
2024-05-10
Contact:
Qiao MENG
About author:
LI Xin, born in 1995, M. S. candidate. His research interests include intelligent transportation, computer vision.Supported by:
摘要:
针对图像分类任务中卷积网络提取图像细粒度特征能力不足、多属性之间的依赖关系无法识别的问题,提出一种基于YOLOv5的车辆多属性分类方法Multi-YOLOv5。该方法设计了多头非极大值抑制(Multi-NMS)和分离式标签损失(Separate-Loss)函数协同工作机制实现车辆的多属性分类任务,并采用卷积块注意力模块(CBAM)、SA(Shuffle Attention)和CoordConv方法重构了YOLOv5检测模型,分别从提升多属性特征能力提取、增强不同属性之间的关联关系、增强网络对位置信息的感知能力三方面提升模型对目标多属性分类的精准性。在VeRi等数据集上进行了训练与测试,实验结果表明,与基于GoogLeNet、残差网络(ResNet)、EfficientNet、ViT(Vision Transformer)等的网络结构相比,Multi-YOLOv5方法在目标的多属性分类方面取得了较好的识别结果,在VeRi数据集上,它的平均精度均值(mAP)达到了87.37%,较上述表现最佳的方法提高了4.47个百分点,且比原YOLOv5模型具有更好的鲁棒性,能为密集环境下的交通目标感知提供可靠的数据信息。
中图分类号:
李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 计算机应用, 2024, 44(5): 1619-1628.
Xin LI, Qiao MENG, Junyi HUANGFU, Lingchen MENG. YOLOv5 multi-attribute classification based on separable label collaborative learning[J]. Journal of Computer Applications, 2024, 44(5): 1619-1628.
方法 | mAP/% | ||
---|---|---|---|
Color | Type | Color+Type | |
YOLOv5-Color | 91.09 | — | 76.84 |
YOLOv5-Type | — | 84.32 | |
本文方法 | — | — | 84.75 |
表1 本文方法与原方法的mAP对比
Tab. 1 Comparison of mAP between proposed method with baseline methods
方法 | mAP/% | ||
---|---|---|---|
Color | Type | Color+Type | |
YOLOv5-Color | 91.09 | — | 76.84 |
YOLOv5-Type | — | 84.32 | |
本文方法 | — | — | 84.75 |
YOLOv5s | CBAM | SA | CoordConv | mAP/% |
---|---|---|---|---|
√ | × | × | × | 84.75 |
√ | √ | × | × | 85.32 |
√ | √ | √ | × | 86.75 |
√ | √ | √ | √ | 87.37 |
表2 VeRi数据集上消融实验结果
Tab. 2 Experimental results of ablation study on VeRi dataset
YOLOv5s | CBAM | SA | CoordConv | mAP/% |
---|---|---|---|---|
√ | × | × | × | 84.75 |
√ | √ | × | × | 85.32 |
√ | √ | √ | × | 86.75 |
√ | √ | √ | √ | 87.37 |
方法 | VeRi | VRID | ||||
---|---|---|---|---|---|---|
mAP/% | F1/% | FPS | mAP/% | F1/% | FPS | |
GoogLeNet | 77.50 | 77.80 | 80.48 | 104.19 | ||
ResNet-34 | 82.10 | 78.15 | 97.00 | 91.78 | 104.62 | |
ResNet-101 | 75.79 | 52.72 | 97.10 | 91.55 | 57.78 | |
EfficientNet-B0 | 79.50 | 77.33 | 88.00 | 82.08 | 112.71 | |
ViT-Base | 71.20 | 68.82 | 81.77 | 62.80 | 56.23 | 107.99 |
本文方法 | 87.37 | 84.19 | 87.53 | 97.91 | 93.39 |
表3 VeRi和VRID数据上对比实验结果
Tab. 3 Comparative experiment results on VeRi and VRID datasets
方法 | VeRi | VRID | ||||
---|---|---|---|---|---|---|
mAP/% | F1/% | FPS | mAP/% | F1/% | FPS | |
GoogLeNet | 77.50 | 77.80 | 80.48 | 104.19 | ||
ResNet-34 | 82.10 | 78.15 | 97.00 | 91.78 | 104.62 | |
ResNet-101 | 75.79 | 52.72 | 97.10 | 91.55 | 57.78 | |
EfficientNet-B0 | 79.50 | 77.33 | 88.00 | 82.08 | 112.71 | |
ViT-Base | 71.20 | 68.82 | 81.77 | 62.80 | 56.23 | 107.99 |
本文方法 | 87.37 | 84.19 | 87.53 | 97.91 | 93.39 |
1 | CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning, 1995, 20: 273-297. 10.1007/bf00994018 |
2 | COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27. 10.1109/tit.1967.1053964 |
3 | QUINLAN J R. Induction of decision trees[J]. Machine Learning, 1986, 1: 81-106. 10.1007/bf00116251 |
4 | SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. 10.1109/cvpr.2015.7298594 |
5 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
6 | TAN M, LE Q. Efficientnet: rethinking model scaling for convolutional neural networks [C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR, 2019: 6105-6114. |
7 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2023-09-10]. . |
8 | 任炜,白鹤翔.基于全局与局部标签关系的多标签图像分类方法[J]. 计算机应用,2022,42(5):1383-1390. |
REN W, BAI H X. Multi-label image classification method based on global and local label relationship[J]. Journal of Computer Applications, 2022, 42(5): 1383-1390. | |
9 | KIM H-C, J-H PARK, KIM D-W, et al. Multilabel naïve Bayes classification considering label dependence[J]. Pattern Recognition Letters, 2020, 136: 279-285. 10.1016/j.patrec.2020.06.021 |
10 | 牟甲鹏, 蔡剑, 余孟池,等. 基于标签相关性的类属属性多标签分类算法[J]. 计算机应用研究, 2020, 37(9): 2656-2658, 2673. 10.19734/j.issn.1001-3695.2019.04.0118 |
MU J P, CAI J, YU M C, et al. Label-correlation based multi-label classification algorithm with label-specific features [J]. Application Research Computers, 2020, 37(9): 2656-2658, 2673. 10.19734/j.issn.1001-3695.2019.04.0118 | |
11 | CHEN Z-M, WEI X-S, WANG P, et al. Multi-label image recognition with graph convolutional networks [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5177-5186. 10.1109/cvpr.2019.00532 |
12 | 白尚旺,王梦瑶,胡静,等.多区域注意力的细粒度图像分类网络[J].计算机工程,2024,50(1):271-278. |
BAI S W, WANG M Y, HU J, et al. Multi-region attention for fine-grained image classification[J]. Computer Engineering, 2024, 50(1): 271-278. | |
13 | 高红民,朱敏,曹雪莹,等.多尺度融合注意力机制的胆囊癌显微高光谱图像分类[J]. 中国图象图形学报,2023,28(4):1173-1185. 10.11834/jig.211201 |
GAO H M, ZHU M, CAO X Y, et al. A micro-hyperspectral image classification method of gallbladder cancer based on multi-scale fusion attention mechanism[J]. Journal of Image and Graphics, 2023, 28(4): 1173-1185. 10.11834/jig.211201 | |
14 | LIU X, LIU W, MEI T, et al. PROVID: progressive and multimodal vehicle reidentification for large-scale urban surveillance[J]. IEEE Transactions on Multimedia, 2018, 20(3): 645-658. 10.1109/tmm.2017.2751966 |
15 | LIU X, LIU W, MEI T, et al. A deep learning-based approach to progressive vehicle re-identification for urban surveillance [C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 869-884. 10.1007/978-3-319-46475-6_53 |
16 | LIU X, LIU W, MA H, et al. Large-scale vehicle re-identification in urban surveillance videos [C]// Proceedings of the 2016 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2016: 1-6. 10.1109/icme.2016.7553002 |
17 | LI X, YUAN M, JIANG Q, et al. VRID-1: a basic vehicle re‑identification dataset for similar vehicles [C]// Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems. Piscataway: IEEE, 2017: 1-8. 10.1109/itsc.2017.8317817 |
18 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
19 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. 10.1109/cvpr.2017.690 |
20 | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-05-14]. . 10.1109/cvpr.2017.690 |
21 | BOCHKOVSKIY A, WANG C-Y, LIAO H-Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-05-14]. . |
22 | 陈帅,袁宇浩.改进YOLOv5的手语字母识别算法研究[J].小型微型计算机系统,2023,44(4):838-844. |
CHEN S, YUAN Y H. Study of improved YOLOv5 algorithms for sign language letter recognition[J]. Journal of Chinese Computer Systems, 2023, 44(4): 838-844. | |
23 | HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. 10.1109/tpami.2015.2389824 |
24 | ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11. 10.1016/j.neunet.2017.12.012 |
25 | LIN T-Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2117-2125. 10.1109/cvpr.2017.106 |
26 | LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768. 10.1109/cvpr.2018.00913 |
27 | ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000. 10.1609/aaai.v34i07.6999 |
28 | WOO S, PARK J, LEE J-Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. 10.1007/978-3-030-01234-2_1 |
29 | LIU R, LEHMAN J, MOLINO P, et al. An intriguing failing of convolutional neural networks and the CoordConv solution [C]// Proceeding of the 32nd Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9628-9639. 10.1109/icinpro43533.2018.9096860 |
30 | IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR, 2015: 448-456. |
31 | ZHANG Q-L, YANG Y-B. SA-Net: shuffle attention for deep convolutional neural networks [C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 2235-2239. 10.1109/icassp39728.2021.9414568 |
32 | WU Y, HE K. Group normalization [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. 10.1007/978-3-030-01261-8_1 |
33 | 河湟杯数据湖算法大赛. 车辆多属性识别赛道[EB/OL]. [2023-08-23]. . |
Hehuang Cup Data Lake Algorithm Competition. Vehicle multi-attribute recognition track[EB/OL]. [2023-08-23]. . |
[1] | 耿焕同, 刘振宇, 蒋骏, 范子辰, 李嘉兴. 基于改进YOLOv8的嵌入式道路裂缝检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1613-1618. |
[2] | 郭琳, 刘坤虎, 马晨阳, 来佑雪, 徐映芬. 基于感受野扩展残差注意力网络的图像超分辨率重建[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1579-1587. |
[3] | 时旺军, 王晶, 宁晓军, 林友芳. 小样本场景下的元迁移学习睡眠分期模型[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1445-1451. |
[4] | 李鸿天, 史鑫昊, 潘卫国, 徐成, 徐冰心, 袁家政. 融合多尺度和注意力机制的小样本目标检测[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1437-1444. |
[5] | 孙敏, 成倩, 丁希宁. 基于CBAM-CGRU-SVM的Android恶意软件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1539-1545. |
[6] | 盖彦辛, 闫涛, 张江峰, 郭小英, 陈斌. 基于时空注意力的空间关联三维形貌重建[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1570-1578. |
[7] | 宋霄罡, 张冬冬, 张鹏飞, 梁莉, 黑新宏. 面向复杂施工环境的实时目标检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1605-1612. |
[8] | 刘子涵, 周登文, 刘玉铠. 基于全局依赖Transformer的图像超分辨率网络[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1588-1596. |
[9] | 徐劲松, 朱明, 李智强, 郭世杰. 基于激发和汇聚注意力的扩散模型生成对象的位置控制方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1093-1098. |
[10] | 王昊冉, 于丹, 杨玉丽, 马垚, 陈永乐. 面向工控系统未知攻击的域迁移入侵检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1158-1165. |
[11] | 袁泉, 陈昌平, 陈泽, 詹林峰. 基于BERT的两次注意力机制远程监督关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1080-1085. |
[12] | 王铂越, 李英祥, 钟剑丹. 基于改进Res-UNet的昼夜地基云图分割网络[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1310-1316. |
[13] | 万泽轩, 谢春丽, 吕泉润, 梁瑶. 基于依赖增强的分层抽象语法树的代码克隆检测[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1259-1268. |
[14] | 唐睿, 岳士博, 张睿智, 刘川, 庞川林. UAV协助下非正交多址接入使能的数据采集系统中能效优化机制[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1209-1218. |
[15] | 尤昕源, 王恒. 基于门控膨胀卷积循环网络的单声道语音增强[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1317-1324. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||