Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (5): 1619-1628.DOI: 10.11772/j.issn.1001-9081.2023050675
Special Issue: 多媒体计算与计算机仿真
• Multimedia computing and computer simulation • Previous Articles Next Articles
Xin LI, Qiao MENG(), Junyi HUANGFU, Lingchen MENG
Received:
2023-06-01
Revised:
2023-09-17
Accepted:
2023-10-11
Online:
2023-10-17
Published:
2024-05-10
Contact:
Qiao MENG
About author:
LI Xin, born in 1995, M. S. candidate. His research interests include intelligent transportation, computer vision.Supported by:
通讯作者:
孟乔
作者简介:
李鑫(1995—),男,四川南充人,硕士研究生,主要研究方向:智能交通、计算机视觉基金资助:
CLC Number:
Xin LI, Qiao MENG, Junyi HUANGFU, Lingchen MENG. YOLOv5 multi-attribute classification based on separable label collaborative learning[J]. Journal of Computer Applications, 2024, 44(5): 1619-1628.
李鑫, 孟乔, 皇甫俊逸, 孟令辰. 基于分离式标签协同学习的YOLOv5多属性分类[J]. 《计算机应用》唯一官方网站, 2024, 44(5): 1619-1628.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023050675
方法 | mAP/% | ||
---|---|---|---|
Color | Type | Color+Type | |
YOLOv5-Color | 91.09 | — | 76.84 |
YOLOv5-Type | — | 84.32 | |
本文方法 | — | — | 84.75 |
Tab. 1 Comparison of mAP between proposed method with baseline methods
方法 | mAP/% | ||
---|---|---|---|
Color | Type | Color+Type | |
YOLOv5-Color | 91.09 | — | 76.84 |
YOLOv5-Type | — | 84.32 | |
本文方法 | — | — | 84.75 |
YOLOv5s | CBAM | SA | CoordConv | mAP/% |
---|---|---|---|---|
√ | × | × | × | 84.75 |
√ | √ | × | × | 85.32 |
√ | √ | √ | × | 86.75 |
√ | √ | √ | √ | 87.37 |
Tab. 2 Experimental results of ablation study on VeRi dataset
YOLOv5s | CBAM | SA | CoordConv | mAP/% |
---|---|---|---|---|
√ | × | × | × | 84.75 |
√ | √ | × | × | 85.32 |
√ | √ | √ | × | 86.75 |
√ | √ | √ | √ | 87.37 |
方法 | VeRi | VRID | ||||
---|---|---|---|---|---|---|
mAP/% | F1/% | FPS | mAP/% | F1/% | FPS | |
GoogLeNet | 77.50 | 77.80 | 80.48 | 104.19 | ||
ResNet-34 | 82.10 | 78.15 | 97.00 | 91.78 | 104.62 | |
ResNet-101 | 75.79 | 52.72 | 97.10 | 91.55 | 57.78 | |
EfficientNet-B0 | 79.50 | 77.33 | 88.00 | 82.08 | 112.71 | |
ViT-Base | 71.20 | 68.82 | 81.77 | 62.80 | 56.23 | 107.99 |
本文方法 | 87.37 | 84.19 | 87.53 | 97.91 | 93.39 |
Tab. 3 Comparative experiment results on VeRi and VRID datasets
方法 | VeRi | VRID | ||||
---|---|---|---|---|---|---|
mAP/% | F1/% | FPS | mAP/% | F1/% | FPS | |
GoogLeNet | 77.50 | 77.80 | 80.48 | 104.19 | ||
ResNet-34 | 82.10 | 78.15 | 97.00 | 91.78 | 104.62 | |
ResNet-101 | 75.79 | 52.72 | 97.10 | 91.55 | 57.78 | |
EfficientNet-B0 | 79.50 | 77.33 | 88.00 | 82.08 | 112.71 | |
ViT-Base | 71.20 | 68.82 | 81.77 | 62.80 | 56.23 | 107.99 |
本文方法 | 87.37 | 84.19 | 87.53 | 97.91 | 93.39 |
1 | CORTES C, VAPNIK V. Support-vector networks[J]. Machine Learning, 1995, 20: 273-297. 10.1007/bf00994018 |
2 | COVER T, HART P. Nearest neighbor pattern classification[J]. IEEE Transactions on Information Theory, 1967, 13(1): 21-27. 10.1109/tit.1967.1053964 |
3 | QUINLAN J R. Induction of decision trees[J]. Machine Learning, 1986, 1: 81-106. 10.1007/bf00116251 |
4 | SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions [C]// Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2015: 1-9. 10.1109/cvpr.2015.7298594 |
5 | HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
6 | TAN M, LE Q. Efficientnet: rethinking model scaling for convolutional neural networks [C]// Proceedings of the 36th International Conference on Machine Learning. New York: JMLR, 2019: 6105-6114. |
7 | DOSOVITSKIY A, BEYER L, KOLESNIKOV A, et al. An image is worth 16×16 words: Transformers for image recognition at scale[EB/OL]. [2023-09-10]. . |
8 | 任炜,白鹤翔.基于全局与局部标签关系的多标签图像分类方法[J]. 计算机应用,2022,42(5):1383-1390. |
REN W, BAI H X. Multi-label image classification method based on global and local label relationship[J]. Journal of Computer Applications, 2022, 42(5): 1383-1390. | |
9 | KIM H-C, J-H PARK, KIM D-W, et al. Multilabel naïve Bayes classification considering label dependence[J]. Pattern Recognition Letters, 2020, 136: 279-285. 10.1016/j.patrec.2020.06.021 |
10 | 牟甲鹏, 蔡剑, 余孟池,等. 基于标签相关性的类属属性多标签分类算法[J]. 计算机应用研究, 2020, 37(9): 2656-2658, 2673. 10.19734/j.issn.1001-3695.2019.04.0118 |
MU J P, CAI J, YU M C, et al. Label-correlation based multi-label classification algorithm with label-specific features [J]. Application Research Computers, 2020, 37(9): 2656-2658, 2673. 10.19734/j.issn.1001-3695.2019.04.0118 | |
11 | CHEN Z-M, WEI X-S, WANG P, et al. Multi-label image recognition with graph convolutional networks [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5177-5186. 10.1109/cvpr.2019.00532 |
12 | 白尚旺,王梦瑶,胡静,等.多区域注意力的细粒度图像分类网络[J].计算机工程,2024,50(1):271-278. |
BAI S W, WANG M Y, HU J, et al. Multi-region attention for fine-grained image classification[J]. Computer Engineering, 2024, 50(1): 271-278. | |
13 | 高红民,朱敏,曹雪莹,等.多尺度融合注意力机制的胆囊癌显微高光谱图像分类[J]. 中国图象图形学报,2023,28(4):1173-1185. 10.11834/jig.211201 |
GAO H M, ZHU M, CAO X Y, et al. A micro-hyperspectral image classification method of gallbladder cancer based on multi-scale fusion attention mechanism[J]. Journal of Image and Graphics, 2023, 28(4): 1173-1185. 10.11834/jig.211201 | |
14 | LIU X, LIU W, MEI T, et al. PROVID: progressive and multimodal vehicle reidentification for large-scale urban surveillance[J]. IEEE Transactions on Multimedia, 2018, 20(3): 645-658. 10.1109/tmm.2017.2751966 |
15 | LIU X, LIU W, MEI T, et al. A deep learning-based approach to progressive vehicle re-identification for urban surveillance [C]// Proceedings of the 14th European Conference on Computer Vision. Cham: Springer, 2016: 869-884. 10.1007/978-3-319-46475-6_53 |
16 | LIU X, LIU W, MA H, et al. Large-scale vehicle re-identification in urban surveillance videos [C]// Proceedings of the 2016 IEEE International Conference on Multimedia and Expo. Piscataway: IEEE, 2016: 1-6. 10.1109/icme.2016.7553002 |
17 | LI X, YUAN M, JIANG Q, et al. VRID-1: a basic vehicle re‑identification dataset for similar vehicles [C]// Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems. Piscataway: IEEE, 2017: 1-8. 10.1109/itsc.2017.8317817 |
18 | REDMON J, DIVVALA S, GIRSHICK R, et al. You only look once: unified, real-time object detection [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 779-788. 10.1109/cvpr.2016.91 |
19 | REDMON J, FARHADI A. YOLO9000: better, faster, stronger [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 6517-6525. 10.1109/cvpr.2017.690 |
20 | REDMON J, FARHADI A. YOLOv3: an incremental improvement[EB/OL]. [2023-05-14]. . 10.1109/cvpr.2017.690 |
21 | BOCHKOVSKIY A, WANG C-Y, LIAO H-Y M. YOLOv4: optimal speed and accuracy of object detection[EB/OL]. [2023-05-14]. . |
22 | 陈帅,袁宇浩.改进YOLOv5的手语字母识别算法研究[J].小型微型计算机系统,2023,44(4):838-844. |
CHEN S, YUAN Y H. Study of improved YOLOv5 algorithms for sign language letter recognition[J]. Journal of Chinese Computer Systems, 2023, 44(4): 838-844. | |
23 | HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 37(9): 1904-1916. 10.1109/tpami.2015.2389824 |
24 | ELFWING S, UCHIBE E, DOYA K. Sigmoid-weighted linear units for neural network function approximation in reinforcement learning[J]. Neural Networks, 2018, 107: 3-11. 10.1016/j.neunet.2017.12.012 |
25 | LIN T-Y, DOLLÁR P, GIRSHICK R, et al. Feature pyramid networks for object detection [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2117-2125. 10.1109/cvpr.2017.106 |
26 | LIU S, QI L, QIN H, et al. Path aggregation network for instance segmentation [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 8759-8768. 10.1109/cvpr.2018.00913 |
27 | ZHENG Z, WANG P, LIU W, et al. Distance-IoU loss: faster and better learning for bounding box regression [J]. Proceedings of the AAAI Conference on Artificial Intelligence, 2020, 34(7): 12993-13000. 10.1609/aaai.v34i07.6999 |
28 | WOO S, PARK J, LEE J-Y, et al. CBAM: convolutional block attention module [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. 10.1007/978-3-030-01234-2_1 |
29 | LIU R, LEHMAN J, MOLINO P, et al. An intriguing failing of convolutional neural networks and the CoordConv solution [C]// Proceeding of the 32nd Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2018: 9628-9639. 10.1109/icinpro43533.2018.9096860 |
30 | IOFFE S, SZEGEDY C. Batch normalization: accelerating deep network training by reducing internal covariate shift [C]// Proceedings of the 32nd International Conference on Machine Learning. New York: JMLR, 2015: 448-456. |
31 | ZHANG Q-L, YANG Y-B. SA-Net: shuffle attention for deep convolutional neural networks [C]// Proceedings of the 2021 IEEE International Conference on Acoustics, Speech and Signal Processing. Piscataway: IEEE, 2021: 2235-2239. 10.1109/icassp39728.2021.9414568 |
32 | WU Y, HE K. Group normalization [C]// Proceedings of the 15th European Conference on Computer Vision. Cham: Springer, 2018: 3-19. 10.1007/978-3-030-01261-8_1 |
33 | 河湟杯数据湖算法大赛. 车辆多属性识别赛道[EB/OL]. [2023-08-23]. . |
Hehuang Cup Data Lake Algorithm Competition. Vehicle multi-attribute recognition track[EB/OL]. [2023-08-23]. . |
[1] | Jing QIN, Zhiguang QIN, Fali LI, Yueheng PENG. Diagnosis of major depressive disorder based on probabilistic sparse self-attention neural network [J]. Journal of Computer Applications, 2024, 44(9): 2970-2974. |
[2] | Xiyuan WANG, Zhancheng ZHANG, Shaokang XU, Baocheng ZHANG, Xiaoqing LUO, Fuyuan HU. Unsupervised cross-domain transfer network for 3D/2D registration in surgical navigation [J]. Journal of Computer Applications, 2024, 44(9): 2911-2918. |
[3] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[4] | Hang YANG, Wanggen LI, Gensheng ZHANG, Zhige WANG, Xin KAI. Multi-layer information interactive fusion algorithm based on graph neural network for session-based recommendation [J]. Journal of Computer Applications, 2024, 44(9): 2719-2725. |
[5] | Shunyong LI, Shiyi LI, Rui XU, Xingwang ZHAO. Incomplete multi-view clustering algorithm based on self-attention fusion [J]. Journal of Computer Applications, 2024, 44(9): 2696-2703. |
[6] | Liehong REN, Lyuwen HUANG, Xu TIAN, Fei DUAN. Multivariate long-term series forecasting method with DFT-based frequency-sensitive dual-branch Transformer [J]. Journal of Computer Applications, 2024, 44(9): 2739-2746. |
[7] | Yunchuan HUANG, Yongquan JIANG, Juntao HUANG, Yan YANG. Molecular toxicity prediction based on meta graph isomorphism network [J]. Journal of Computer Applications, 2024, 44(9): 2964-2969. |
[8] | Yexin PAN, Zhe YANG. Optimization model for small object detection based on multi-level feature bidirectional fusion [J]. Journal of Computer Applications, 2024, 44(9): 2871-2877. |
[9] | Zhiqiang ZHAO, Peihong MA, Xinhong HEI. Crowd counting method based on dual attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2886-2892. |
[10] | Yeheng LI, Guangsheng LUO, Qianmin SU. Logo detection algorithm based on improved YOLOv5 [J]. Journal of Computer Applications, 2024, 44(8): 2580-2587. |
[11] | Kaipeng XUE, Tao XU, Chunjie LIAO. Multimodal sentiment analysis network with self-supervision and multi-layer cross attention [J]. Journal of Computer Applications, 2024, 44(8): 2387-2392. |
[12] | Yuqing WANG, Guangli ZHU, Wenjie DUAN, Shuyu LI, Ruotong ZHOU. Sentiment classification model of psychological counseling text based on attention over attention mechanism [J]. Journal of Computer Applications, 2024, 44(8): 2393-2399. |
[13] | Pengqi GAO, Heming HUANG, Yonghong FAN. Fusion of coordinate and multi-head attention mechanisms for interactive speech emotion recognition [J]. Journal of Computer Applications, 2024, 44(8): 2400-2406. |
[14] | Tong CHEN, Fengyu YANG, Yu XIONG, Hong YAN, Fuxing QIU. Construction method of voiceprint library based on multi-scale frequency-channel attention fusion [J]. Journal of Computer Applications, 2024, 44(8): 2407-2413. |
[15] | Caiqin WANG, Yuhao ZHOU, Shunxiang ZHANG, Yanhui WANG, Xiaolong WANG. Aspect-opinion pair extraction of new energy vehicle complaint text based on context enhancement [J]. Journal of Computer Applications, 2024, 44(8): 2430-2436. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||