《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (5): 1619-1628.DOI: 10.11772/j.issn.1001-9081.2023050675
所属专题: 多媒体计算与计算机仿真
收稿日期:
2023-06-01
修回日期:
2023-09-17
接受日期:
2023-10-11
发布日期:
2023-10-17
出版日期:
2024-05-10
通讯作者:
孟乔
作者简介:
李鑫(1995—),男,四川南充人,硕士研究生,主要研究方向:智能交通、计算机视觉基金资助:
Xin LI, Qiao MENG(), Junyi HUANGFU, Lingchen MENG
Received:
2023-06-01
Revised:
2023-09-17
Accepted:
2023-10-11
Online:
2023-10-17
Published:
2024-05-10
Contact:
Qiao MENG
About author:
LI Xin, born in 1995, M. S. candidate. His research interests include intelligent transportation, computer vision.Supported by:
摘要:
针对图像分类任务中卷积网络提取图像细粒度特征能力不足、多属性之间的依赖关系无法识别的问题,提出一种基于YOLOv5的车辆多属性分类方法Multi-YOLOv5。该方法设计了多头非极大值抑制(Multi-NMS)和分离式标签损失(Separate-Loss)函数协同工作机制实现车辆的多属性分类任务,并采用卷积块注意力模块(CBAM)、SA(Shuffle Attention)和CoordConv方法重构了YOLOv5检测模型,分别从提升多属性特征能力提取、增强不同属性之间的关联关系、增强网络对位置信息的感知能力三方面提升模型对目标多属性分类的精准性。在VeRi等数据集上进行了训练与测试,实验结果表明,与基于GoogLeNet、残差网络(ResNet)、EfficientNet、ViT(Vision Transformer)等的网络结构相比,Multi-YOLOv5方法在目标的多属性分类方面取得了较好的识别结果,在VeRi数据集上,它的平均精度均值(mAP)达到了87.37%,较上述表现最佳的方法提高了4.47个百分点,且比原YOLOv5模型具有更好的鲁棒性,能为密集环境下的交通目标感知提供可靠的数据信息。
中图分类号:
李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 计算机应用, 2024, 44(5): 1619-1628.
Xin LI, Qiao MENG, Junyi HUANGFU, Lingchen MENG. YOLOv5 multi-attribute classification based on separable label collaborative learning[J]. Journal of Computer Applications, 2024, 44(5): 1619-1628.
方法 | mAP/% | ||
---|---|---|---|
Color | Type | Color+Type | |
YOLOv5-Color | 91.09 | — | 76.84 |
YOLOv5-Type | — | 84.32 | |
本文方法 | — | — | 84.75 |
表1 本文方法与原方法的mAP对比
Tab. 1 Comparison of mAP between proposed method with baseline methods
方法 | mAP/% | ||
---|---|---|---|
Color | Type | Color+Type | |
YOLOv5-Color | 91.09 | — | 76.84 |
YOLOv5-Type | — | 84.32 | |
本文方法 | — | — | 84.75 |
YOLOv5s | CBAM | SA | CoordConv | mAP/% |
---|---|---|---|---|
√ | × | × | × | 84.75 |
√ | √ | × | × | 85.32 |
√ | √ | √ | × | 86.75 |
√ | √ | √ | √ | 87.37 |
表2 VeRi数据集上消融实验结果
Tab. 2 Experimental results of ablation study on VeRi dataset
YOLOv5s | CBAM | SA | CoordConv | mAP/% |
---|---|---|---|---|
√ | × | × | × | 84.75 |
√ | √ | × | × | 85.32 |
√ | √ | √ | × | 86.75 |
√ | √ | √ | √ | 87.37 |
方法 | VeRi | VRID | ||||
---|---|---|---|---|---|---|
mAP/% | F1/% | FPS | mAP/% | F1/% | FPS | |
GoogLeNet | 77.50 | 77.80 | 80.48 | 104.19 | ||
ResNet-34 | 82.10 | 78.15 | 97.00 | 91.78 | 104.62 | |
ResNet-101 | 75.79 | 52.72 | 97.10 | 91.55 | 57.78 | |
EfficientNet-B0 | 79.50 | 77.33 | 88.00 | 82.08 | 112.71 | |
ViT-Base | 71.20 | 68.82 | 81.77 | 62.80 | 56.23 | 107.99 |
本文方法 | 87.37 | 84.19 | 87.53 | 97.91 | 93.39 |
表3 VeRi和VRID数据上对比实验结果
Tab. 3 Comparative experiment results on VeRi and VRID datasets
方法 | VeRi | VRID | ||||
---|---|---|---|---|---|---|
mAP/% | F1/% | FPS | mAP/% | F1/% | FPS | |
GoogLeNet | 77.50 | 77.80 | 80.48 | 104.19 | ||
ResNet-34 | 82.10 | 78.15 | 97.00 | 91.78 | 104.62 | |
ResNet-101 | 75.79 | 52.72 | 97.10 | 91.55 | 57.78 | |
EfficientNet-B0 | 79.50 | 77.33 | 88.00 | 82.08 | 112.71 | |
ViT-Base | 71.20 | 68.82 | 81.77 | 62.80 | 56.23 | 107.99 |
本文方法 | 87.37 | 84.19 | 87.53 | 97.91 | 93.39 |
1 | CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning, 1995, 20: 273-297. 10.1007/bf00994018 |
2 | COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27. 10.1109/tit.1967.1053964 |
3 | QUINLAN J R. Induction of decision trees[J]. Machine Learning, 1986, 1: 81-106. 10.1007/bf00116251 |
4 | SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. 10.1109/cvpr.2015.7298594 |
5 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
6 | TAN M, LE Q. Efficientnet: rethinking model scaling for convolutional neural networks [C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR, 2019: 6105-6114. |
7 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2023-09-10]. . |
8 | 任炜,白鹤翔.基于全局与局部标签关系的多标签图像分类方法[J]. 计算机应用,2022,42(5):1383-1390. |
REN W, BAI H X. Multi-label image classification method based on global and local label relationship[J]. Journal of Computer Applications, 2022, 42(5): 1383-1390. | |
9 | KIM H-C, J-H PARK, KIM D-W, et al. Multilabel naïve Bayes classification considering label dependence[J]. Pattern Recognition Letters, 2020, 136: 279-285. 10.1016/j.patrec.2020.06.021 |
10 | 牟甲鹏, 蔡剑, 余孟池,等. 基于标签相关性的类属属性多标签分类算法[J]. 计算机应用研究, 2020, 37(9): 2656-2658, 2673. 10.19734/j.issn.1001-3695.2019.04.0118 |
MU J P, CAI J, YU M C, et al. Label-correlation based multi-label classification algorithm with label-specific features [J]. Application Research Computers, 2020, 37(9): 2656-2658, 2673. 10.19734/j.issn.1001-3695.2019.04.0118 | |
11 | CHEN Z-M, WEI X-S, WANG P, et al. Multi-label image recognition with graph convolutional networks [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5177-5186. 10.1109/cvpr.2019.00532 |
12 | 白尚旺,王梦瑶,胡静,等.多区域注意力的细粒度图像分类网络[J].计算机工程,2024,50(1):271-278. |
BAI S W, WANG M Y, HU J, et al. Multi-region attention for fine-grained image classification[J]. Computer Engineering, 2024, 50(1): 271-278. | |
13 | 高红民,朱敏,曹雪莹,等.多尺度融合注意力机制的胆囊癌显微高光谱图像分类[J]. 中国图象图形学报,2023,28(4):1173-1185. 10.11834/jig.211201 |
GAO H M, ZHU M, CAO X Y, et al. A micro-hyperspectral image classification method of gallbladder cancer based on multi-scale fusion attention mechanism[J]. Journal of Image and Graphics, 2023, 28(4): 1173-1185. 10.11834/jig.211201 | |
14 | LIU X, LIU W, MEI T, et al. PROVID: progressive and multimodal vehicle reidentification for large-scale urban surveillance[J]. IEEE Transactions on Multimedia, 2018, 20(3): 645-658. 10.1109/tmm.2017.2751966 |
15 | LIU X, LIU W, MEI T, et al. A deep learning-based approach to progressive vehicle re-identification for urban surveillance [C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 869-884. 10.1007/978-3-319-46475-6_53 |
16 | LIU X, LIU W, MA H, et al. Large-scale vehicle re-identification in urban surveillance videos [C]// Proceedings of the 2016 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2016: 1-6. 10.1109/icme.2016.7553002 |
17 | LI X, YUAN M, JIANG Q, et al. VRID-1: a basic vehicle re‑identification dataset for similar vehicles [C]// Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems. Piscataway: IEEE, 2017: 1-8. 10.1109/itsc.2017.8317817 |
18 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
19 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. 10.1109/cvpr.2017.690 |
20 | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-05-14]. . 10.1109/cvpr.2017.690 |
21 | BOCHKOVSKIY A, WANG C-Y, LIAO H-Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-05-14]. . |
22 | 陈帅,袁宇浩.改进YOLOv5的手语字母识别算法研究[J].小型微型计算机系统,2023,44(4):838-844. |
CHEN S, YUAN Y H. Study of improved YOLOv5 algorithms for sign language letter recognition[J]. Journal of Chinese Computer Systems, 2023, 44(4): 838-844. | |
23 | HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. 10.1109/tpami.2015.2389824 |
24 | ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11. 10.1016/j.neunet.2017.12.012 |
25 | LIN T-Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2117-2125. 10.1109/cvpr.2017.106 |
26 | LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768. 10.1109/cvpr.2018.00913 |
27 | ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000. 10.1609/aaai.v34i07.6999 |
28 | WOO S, PARK J, LEE J-Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. 10.1007/978-3-030-01234-2_1 |
29 | LIU R, LEHMAN J, MOLINO P, et al. An intriguing failing of convolutional neural networks and the CoordConv solution [C]// Proceeding of the 32nd Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9628-9639. 10.1109/icinpro43533.2018.9096860 |
30 | IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR, 2015: 448-456. |
31 | ZHANG Q-L, YANG Y-B. SA-Net: shuffle attention for deep convolutional neural networks [C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 2235-2239. 10.1109/icassp39728.2021.9414568 |
32 | WU Y, HE K. Group normalization [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. 10.1007/978-3-030-01261-8_1 |
33 | 河湟杯数据湖算法大赛. 车辆多属性识别赛道[EB/OL]. [2023-08-23]. . |
Hehuang Cup Data Lake Algorithm Competition. Vehicle multi-attribute recognition track[EB/OL]. [2023-08-23]. . |
[1] | 潘烨新, 杨哲. 基于多级特征双向融合的小目标检测优化模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2871-2877. |
[2] | 赵志强, 马培红, 黑新宏. 基于双重注意力机制的人群计数方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2886-2892. |
[3] | 李顺勇, 李师毅, 胥瑞, 赵兴旺. 基于自注意力融合的不完整多视图聚类算法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2696-2703. |
[4] | 任烈弘, 黄铝文, 田旭, 段飞. 基于DFT的频率敏感双分支Transformer多变量长时间序列预测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2739-2746. |
[5] | 秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974. |
[6] | 王熙源, 张战成, 徐少康, 张宝成, 罗晓清, 胡伏原. 面向手术导航3D/2D配准的无监督跨域迁移网络[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2911-2918. |
[7] | 李力铤, 华蓓, 贺若舟, 徐况. 基于解耦注意力机制的多变量时序预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2732-2738. |
[8] | 杨航, 李汪根, 张根生, 王志格, 开新. 基于图神经网络的多层信息交互融合算法用于会话推荐[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2719-2725. |
[9] | 黄云川, 江永全, 黄骏涛, 杨燕. 基于元图同构网络的分子毒性预测[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2964-2969. |
[10] | 李烨恒, 罗光圣, 苏前敏. 基于改进YOLOv5的Logo检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2580-2587. |
[11] | 薛凯鹏, 徐涛, 廖春节. 融合自监督和多层交叉注意力的多模态情感分析网络[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2387-2392. |
[12] | 汪雨晴, 朱广丽, 段文杰, 李书羽, 周若彤. 基于交互注意力机制的心理咨询文本情感分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2393-2399. |
[13] | 高鹏淇, 黄鹤鸣, 樊永红. 融合坐标与多头注意力机制的交互语音情感识别[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2400-2406. |
[14] | 陈彤, 杨丰玉, 熊宇, 严荭, 邱福星. 基于多尺度频率通道注意力融合的声纹库构建方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2407-2413. |
[15] | 汪才钦, 周渝皓, 张顺香, 王琰慧, 王小龙. 基于语境增强的新能源汽车投诉文本方面-观点对抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2430-2436. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||