《计算机应用》唯一官方网站 ›› 2022, Vol. 42 ›› Issue (5): 1383-1390.DOI: 10.11772/j.issn.1001-9081.2021071240
收稿日期:
2021-07-16
修回日期:
2021-08-31
接受日期:
2021-09-14
发布日期:
2021-09-28
出版日期:
2022-05-10
通讯作者:
任炜
作者简介:
任炜(1996—),男,山西襄汾人,硕士研究生,主要研究方向:深度学习、计算机视觉 2783800599@qq.com基金资助:
Received:
2021-07-16
Revised:
2021-08-31
Accepted:
2021-09-14
Online:
2021-09-28
Published:
2022-05-10
Contact:
Wei REN
About author:
REN Wei, born in 1996, M. S. candidate. His research interests include deep learning, computer vision.Supported by:
摘要:
针对多标签图像分类任务中存在的难以对标签间的相互作用建模和全局标签关系固化的问题,结合自注意力机制和知识蒸馏(KD)方法,提出了一种基于全局与局部标签关系的多标签图像分类方法(ML-GLLR)。首先,局部标签关系(LLR)模型使用卷积神经网络(CNN)、语义模块和双层自注意力(DLSA)模块对局部标签关系建模;然后,利用KD方法使LLR学习全局标签关系。在公开数据集MSCOCO2014和VOC2007上进行实验,LLR相较于基于图卷积神经网络多标签图像分类(ML-GCN)方法,在平均精度均值(mAP)上分别提高了0.8个百分点和0.6个百分点,ML-GLLR相较于LLR在mAP上分别进一步提高了0.2个百分点和1.3个百分点。实验结果表明,所提ML-GLLR不仅能对标签间的相互关系进行建模,也能避免全局标签关系固化的问题。
中图分类号:
任炜, 白鹤翔. 基于全局与局部标签关系的多标签图像分类方法[J]. 计算机应用, 2022, 42(5): 1383-1390.
Wei REN, Hexiang BAI. Multi-label image classification method based on global and local label relationship[J]. Journal of Computer Applications, 2022, 42(5): 1383-1390.
方法 | mAP | ALL | Top-3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CP | CR | CF1 | OP | OR | OF1 | CP | CR | CF1 | OP | OR | OF1 | ||
CNN-RNN | 61.2 | ― | ― | ― | ― | ― | ― | 66.0 | 55.6 | 60.4 | 69.2 | 66.4 | 67.8 |
SRN | 77.1 | 81.6 | 65.4 | 71.2 | 82.7 | 69.9 | 75.8 | 85.2 | 58.8 | 67.4 | 87.4 | 62.5 | 72.9 |
Multi-Evidence | ― | 80.4 | 70.2 | 74.9 | 85.2 | 72.5 | 78.4 | 84.5 | 62.2 | 70.6 | 89.1 | 64.3 | 74.7 |
Res-101 | 80.1 | 78.2 | 71.9 | 74.9 | 82.3 | 75.0 | 78.5 | 82.8 | 63.4 | 71.8 | 87.6 | 65.5 | 75.0 |
CNN-LSTM-Att | ― | 80.9 | 70.9 | 75.6 | 83.7 | 74.9 | 79.1 | ― | ― | ― | ― | ― | ― |
ML-GCN | 83.0 | 85.1 | 72.0 | 78.0 | 85.8 | 75.4 | 80.3 | 89.2 | 64.1 | 74.6 | 90.5 | 66.5 | 76.7 |
SSGRL | 83.8 | 89.9 | 68.5 | 76.8 | 91.3 | 70.8 | 79.7 | 91.9 | 62.5 | 72.7 | 93.8 | 64.1 | 76.2 |
LLR | 83.8 | 86.0 | 72.6 | 78.8 | 86.9 | 75.8 | 81.0 | 89.4 | 64.6 | 75.0 | 90.7 | 67.0 | 77.0 |
ML-GLLR | 84.0 | 86.5 | 72.4 | 78.8 | 87.1 | 75.8 | 81.1 | 90.0 | 64.0 | 74.8 | 91.3 | 66.7 | 77.1 |
表1 不同方法在MSCOCO2014数据集上的评价指标对比 ( %)
Tab. 1 Evaluation index comparison of different methods on MSCOCO2014 dataset
方法 | mAP | ALL | Top-3 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CP | CR | CF1 | OP | OR | OF1 | CP | CR | CF1 | OP | OR | OF1 | ||
CNN-RNN | 61.2 | ― | ― | ― | ― | ― | ― | 66.0 | 55.6 | 60.4 | 69.2 | 66.4 | 67.8 |
SRN | 77.1 | 81.6 | 65.4 | 71.2 | 82.7 | 69.9 | 75.8 | 85.2 | 58.8 | 67.4 | 87.4 | 62.5 | 72.9 |
Multi-Evidence | ― | 80.4 | 70.2 | 74.9 | 85.2 | 72.5 | 78.4 | 84.5 | 62.2 | 70.6 | 89.1 | 64.3 | 74.7 |
Res-101 | 80.1 | 78.2 | 71.9 | 74.9 | 82.3 | 75.0 | 78.5 | 82.8 | 63.4 | 71.8 | 87.6 | 65.5 | 75.0 |
CNN-LSTM-Att | ― | 80.9 | 70.9 | 75.6 | 83.7 | 74.9 | 79.1 | ― | ― | ― | ― | ― | ― |
ML-GCN | 83.0 | 85.1 | 72.0 | 78.0 | 85.8 | 75.4 | 80.3 | 89.2 | 64.1 | 74.6 | 90.5 | 66.5 | 76.7 |
SSGRL | 83.8 | 89.9 | 68.5 | 76.8 | 91.3 | 70.8 | 79.7 | 91.9 | 62.5 | 72.7 | 93.8 | 64.1 | 76.2 |
LLR | 83.8 | 86.0 | 72.6 | 78.8 | 86.9 | 75.8 | 81.0 | 89.4 | 64.6 | 75.0 | 90.7 | 67.0 | 77.0 |
ML-GLLR | 84.0 | 86.5 | 72.4 | 78.8 | 87.1 | 75.8 | 81.1 | 90.0 | 64.0 | 74.8 | 91.3 | 66.7 | 77.1 |
方法 | mAP | 各类别AP | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
航天 | 自行车 | 鸟 | 船 | 瓶子 | 公交车 | 轿车 | 猫 | 椅子 | 牛 | 桌子 | 狗 | 马 | 摩托 | 人 | 植物 | 羊 | 沙发 | 火车 | 电视机 | ||
CNN-RNN | 84.0 | 96.7 | 83.1 | 94.2 | 92.8 | 61.2 | 82.1 | 89.1 | 94.2 | 64.2 | 83.6 | 70.0 | 92.4 | 91.7 | 84.2 | 93.7 | 59.8 | 93.2 | 75.3 | 99.7 | 78.6 |
RLSD | 88.5 | 96.4 | 92.7 | 93.8 | 94.1 | 71.2 | 92.5 | 94.2 | 95.7 | 74.3 | 90.0 | 74.2 | 95.4 | 96.2 | 92.1 | 97.9 | 66.9 | 93.5 | 73.7 | 97.5 | 87.6 |
VGG | 89.7 | 98.9 | 95.0 | 96.8 | 95.4 | 69.7 | 90.4 | 93.5 | 96.0 | 74.2 | 86.6 | 87.8 | 96.0 | 96.3 | 93.1 | 97.2 | 70.0 | 92.1 | 80.3 | 98.1 | 87.0 |
HCP | 90.9 | 98.6 | 97.1 | 98.0 | 95.6 | 75.3 | 94.7 | 95.8 | 97.3 | 73.1 | 90.2 | 80.0 | 97.3 | 96.1 | 94.9 | 96.3 | 78.3 | 94.7 | 76.2 | 97.9 | 91.5 |
Res-101 | 91.9 | 99.1 | 97.6 | 96.5 | 95.1 | 74.2 | 91.3 | 96.0 | 95.8 | 75.5 | 92.2 | 88.5 | 96.2 | 96.6 | 94.3 | 98.5 | 83.2 | 94.8 | 84.7 | 98.6 | 90.1 |
ML-GCN | 94.0 | 99.5 | 98.5 | 98.6 | 98.1 | 80.8 | 94.6 | 97.2 | 98.2 | 82.3 | 95.7 | 86.4 | 98.2 | 98.4 | 96.7 | 99.0 | 84.7 | 96.7 | 84.3 | 98.9 | 93.7 |
SSGRL | 95.0 | 99.7 | 98.4 | 98.0 | 97.6 | 85.7 | 96.2 | 98.2 | 98.8 | 82.0 | 98.1 | 89.7 | 98.8 | 98.7 | 97.0 | 99.0 | 86.9 | 98.1 | 85.8 | 99.0 | 93.7 |
LLR | 94.6 | 99.4 | 97.5 | 97.9 | 97.1 | 83.9 | 95.2 | 97.7 | 98.0 | 83.6 | 95.4 | 90.0 | 97.7 | 98.0 | 96.3 | 99.0 | 86.8 | 96.5 | 88.4 | 98.7 | 94.4 |
ML-GLLR | 95.9 | 99.8 | 98.4 | 98.2 | 98.2 | 86.2 | 97.6 | 98.2 | 98.8 | 85.7 | 97.2 | 92.6 | 98.7 | 98.9 | 97.1 | 99.2 | 89.2 | 98.3 | 90.7 | 99.3 | 96.1 |
表2 不同方法在VOC2007数据集上各标签的结果对比 ( %)
Tab. 2 Comparison of results in various labels on VOC2007 dataset with different methods
方法 | mAP | 各类别AP | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
航天 | 自行车 | 鸟 | 船 | 瓶子 | 公交车 | 轿车 | 猫 | 椅子 | 牛 | 桌子 | 狗 | 马 | 摩托 | 人 | 植物 | 羊 | 沙发 | 火车 | 电视机 | ||
CNN-RNN | 84.0 | 96.7 | 83.1 | 94.2 | 92.8 | 61.2 | 82.1 | 89.1 | 94.2 | 64.2 | 83.6 | 70.0 | 92.4 | 91.7 | 84.2 | 93.7 | 59.8 | 93.2 | 75.3 | 99.7 | 78.6 |
RLSD | 88.5 | 96.4 | 92.7 | 93.8 | 94.1 | 71.2 | 92.5 | 94.2 | 95.7 | 74.3 | 90.0 | 74.2 | 95.4 | 96.2 | 92.1 | 97.9 | 66.9 | 93.5 | 73.7 | 97.5 | 87.6 |
VGG | 89.7 | 98.9 | 95.0 | 96.8 | 95.4 | 69.7 | 90.4 | 93.5 | 96.0 | 74.2 | 86.6 | 87.8 | 96.0 | 96.3 | 93.1 | 97.2 | 70.0 | 92.1 | 80.3 | 98.1 | 87.0 |
HCP | 90.9 | 98.6 | 97.1 | 98.0 | 95.6 | 75.3 | 94.7 | 95.8 | 97.3 | 73.1 | 90.2 | 80.0 | 97.3 | 96.1 | 94.9 | 96.3 | 78.3 | 94.7 | 76.2 | 97.9 | 91.5 |
Res-101 | 91.9 | 99.1 | 97.6 | 96.5 | 95.1 | 74.2 | 91.3 | 96.0 | 95.8 | 75.5 | 92.2 | 88.5 | 96.2 | 96.6 | 94.3 | 98.5 | 83.2 | 94.8 | 84.7 | 98.6 | 90.1 |
ML-GCN | 94.0 | 99.5 | 98.5 | 98.6 | 98.1 | 80.8 | 94.6 | 97.2 | 98.2 | 82.3 | 95.7 | 86.4 | 98.2 | 98.4 | 96.7 | 99.0 | 84.7 | 96.7 | 84.3 | 98.9 | 93.7 |
SSGRL | 95.0 | 99.7 | 98.4 | 98.0 | 97.6 | 85.7 | 96.2 | 98.2 | 98.8 | 82.0 | 98.1 | 89.7 | 98.8 | 98.7 | 97.0 | 99.0 | 86.9 | 98.1 | 85.8 | 99.0 | 93.7 |
LLR | 94.6 | 99.4 | 97.5 | 97.9 | 97.1 | 83.9 | 95.2 | 97.7 | 98.0 | 83.6 | 95.4 | 90.0 | 97.7 | 98.0 | 96.3 | 99.0 | 86.8 | 96.5 | 88.4 | 98.7 | 94.4 |
ML-GLLR | 95.9 | 99.8 | 98.4 | 98.2 | 98.2 | 86.2 | 97.6 | 98.2 | 98.8 | 85.7 | 97.2 | 92.6 | 98.7 | 98.9 | 97.1 | 99.2 | 89.2 | 98.3 | 90.7 | 99.3 | 96.1 |
方法 | MSCOCO2014 | VOC2007 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
mAP | T-OF1 | T-CF1 | A-OF1 | A-CF1 | mAP | T-OF1 | T-CF1 | A-OF1 | A-CF1 | |
Res-101 | 80.1 | 75.0 | 71.8 | 78.5 | 74.9 | 91.9 | 87.7 | 85.5 | 87.7 | 85.5 |
LLR(无DLSA模块) | 81.4 | 75.6 | 72.9 | 79.2 | 76.5 | 92.7 | 89.3 | 86.9 | 89.3 | 86.9 |
LLR(无语义模块) | 82.1 | 76.0 | 72.8 | 79.6 | 77.0 | 93.6 | 89.9 | 87.7 | 89.9 | 87.7 |
LLR | 83.8 | 77.0 | 75.0 | 81.0 | 78.8 | 94.6 | 90.5 | 88.6 | 90.4 | 88.5 |
ML-GLLR | 84.0 | 77.1 | 74.8 | 81.1 | 78.8 | 95.9 | 90.9 | 89.6 | 90.9 | 89.5 |
表3 消融实验结果 ( %)
Tab. 3 Ablation experimental results
方法 | MSCOCO2014 | VOC2007 | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
mAP | T-OF1 | T-CF1 | A-OF1 | A-CF1 | mAP | T-OF1 | T-CF1 | A-OF1 | A-CF1 | |
Res-101 | 80.1 | 75.0 | 71.8 | 78.5 | 74.9 | 91.9 | 87.7 | 85.5 | 87.7 | 85.5 |
LLR(无DLSA模块) | 81.4 | 75.6 | 72.9 | 79.2 | 76.5 | 92.7 | 89.3 | 86.9 | 89.3 | 86.9 |
LLR(无语义模块) | 82.1 | 76.0 | 72.8 | 79.6 | 77.0 | 93.6 | 89.9 | 87.7 | 89.9 | 87.7 |
LLR | 83.8 | 77.0 | 75.0 | 81.0 | 78.8 | 94.6 | 90.5 | 88.6 | 90.4 | 88.5 |
ML-GLLR | 84.0 | 77.1 | 74.8 | 81.1 | 78.8 | 95.9 | 90.9 | 89.6 | 90.9 | 89.5 |
1 | KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks [C]// Proceedings of the 2012 25th International Conference on Neural Information Processing Systems. Red Hook: Curran Associates Inc., 2012: 1097-1105. |
2 | SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2021-03-15]. . 10.5244/c.28.6 |
3 | HE K M, ZHANG X Y, REN S Q, et al. Deep residual learning for image recognition [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 770-778. 10.1109/cvpr.2016.90 |
4 | 刘尚旺,郜翔.基于深度模型迁移的细粒度图像分类方法[J].计算机应用,2018,38(8):2198-2204. |
LIU S W, GAO X. Fine-grained image classification method based on deep model transfer [J]. Journal of Computer Applications, 2018, 38(8): 2198-2204. | |
5 | DENG J, DONG W, SOCHER R, et al. ImageNet: a large-scale hierarchical image database [C]// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2009: 248-255. 10.1109/cvpr.2009.5206848 |
6 | PHAM H, DAI Z H, XIE Q Z, et al. Meta pseudo labels [C]// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2021: 11552-11563. 10.1109/cvpr46437.2021.01139 |
7 | LIN T Y, MAIRE M, BELONGIE S, et al. Microsoft COCO: common objects in context [C]// Proceedings of the 2014 European Conference on Computer Vision, LNCS 8693. Cham: Springer, 2014: 740-755. |
8 | ZHU F, LI H S, OUYANG W L, et al. Learning spatial regularization with image-level supervisions for multi-label image classification [C]// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2017: 2027-2036. 10.1109/cvpr.2017.219 |
9 | WANG J, YANG Y, MAO J H, et al. CNN-RNN: a unified framework for multi-label image classification [C]// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2016: 2285-2294. 10.1109/cvpr.2016.251 |
10 | CHEN S F, CHEN Y C, YEH C K, et al. Order-free RNN with visual attention for multi-label classification [C]// Proceedings of the 2018 32nd AAAI Conference on Artificial Intelligence. Palo Alto: AAAI Press, 2018: 6714-6721. |
11 | YAZICI V O, GONZALEZ-GARCIA A, RAMISA A, et al. Orderless recurrent models for multi-label classification [C]// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2020: 13437-13446. 10.1109/cvpr42600.2020.01345 |
12 | CHEN Z M, WEI X S, WANG P, et al. Multi-label image recognition with graph convolutional networks [C]// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2019: 5172-5181. 10.1109/cvpr.2019.00532 |
13 | EVERINGHAM M, GOOL VAN L, WILLIAMS C K I, et al. The PASCAL Visual Object Classes (VOC) challenge [J]. International Journal of Computer Vision, 2010, 88(2):303-338. 10.1007/s11263-009-0275-4 |
14 | XU K, BA J L, KIROS R, et al. Show, attend and tell: neural image caption generation with visual attention [C]// Proceedings of the 2015 32nd International Conference on Machine Learning. New York: JMLR.org, 2015: 2048-2057. |
15 | 张小川,戴旭尧,刘璐,等.融合多头自注意力机制的中文短文本分类模型[J].计算机应用,2020,40(12):3485-3489. 10.11772/j.issn.1001-9081.2020060914 |
ZHANG X C, DAI X Y, LIU L, et al. Chinese short text classification model with multi-head self-attention mechanism [J]. Journal of Computer Applications, 2020, 40(12): 3485-3489. 10.11772/j.issn.1001-9081.2020060914 | |
16 | 高钦泉,赵岩,李根,等.基于知识蒸馏的超分辨率卷积神经网络压缩方法[J].计算机应用,2019,39(10):2802-2808. |
GAO Q Q, ZHAO Y, LI G, et al. Compression method of super-resolution convolutional neural network based on knowledge distillation [J]. Journal of Computer Applications, 2019, 39(10): 2802-2808. | |
17 | 邓棋,雷印杰,田锋.用于肺炎图像分类的优化卷积神经网络方法[J].计算机应用,2020,40(1):71-76. |
DENG Q, LEI Y J, TIAN F. Optimized convolutional neural network method for classification of pneumonia images [J]. Journal of Computer Applications, 2020, 40(1): 71-76. | |
18 | CHEN T S, XU M X, HUI X L, et al. Learning semantic-specific graph representation for multi-label image recognition [C]// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway: IEEE, 2019: 522-531. 10.1109/iccv.2019.00061 |
19 | GE W F, YANG S B, YU Y Z. Multi-evidence filtering and fusion for multi-label classification, object detection and semantic segmentation based on weakly supervised learning [C]// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway: IEEE, 2018: 1277-1286. 10.1109/cvpr.2018.00139 |
20 | ZHANG J J, WU Q, SHEN C H, et al. Multilabel image classification with regional latent semantic dependencies [J]. IEEE Transactions on Multimedia, 2018, 20(10): 2801-2813. 10.1109/tmm.2018.2812605 |
21 | WEI Y C, XIA W, LIN M, et al. HCP: a flexible CNN framework for multi-label image classification [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(9): 1901-1907. 10.1109/tpami.2015.2491929 |
[1] | 屈震, 李堃婷, 冯志玺. 基于有效通道注意力的遥感图像场景分类[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1431-1439. |
[2] | 邱永茹, 姚光乐, 冯杰, 崔昊宇. 基于半监督学习的单幅图像去雨算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1577-1582. |
[3] | 谢新林, 肖毅, 续欣莹. 基于神经网络架构搜索的肺结节分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1424-1430. |
[4] | 包永春, 张建臣, 杜守信, 张军军. 基于非负矩阵分解与稀疏表示的多标签分类算法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1375-1382. |
[5] | 鲁永帅, 唐英杰, 马鑫然. 基于深度特征融合的无纺布低对比度浆丝缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1440-1446. |
[6] | 陈浩杰, 范江亭, 刘勇. 深度强化学习解决动态旅行商问题[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1194-1200. |
[7] | 汪祖民, 张志豪, 秦静, 季长清. 基于卷积神经网络的机械故障诊断技术综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1036-1043. |
[8] | 张锦, 屈佩琪, 孙程, 罗蒙. 基于改进YOLOv5的安全帽佩戴检测算法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1292-1300. |
[9] | 季长清, 高志勇, 秦静, 汪祖民. 基于卷积神经网络的图像分类算法综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1044-1049. |
[10] | 王颖洁, 朱久祺, 汪祖民, 白凤波, 弓箭. 自然语言处理在文本情感分析领域应用综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1011-1020. |
[11] | 顾军华, 樊帅, 李宁宁, 张素琪. 基于知识图偏好注意力网络的长短期推荐模型及其更新方法[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1079-1086. |
[12] | 刘志华, 陈文洁, 陈爱斌. 基于自注意力机制时频谱同源特征融合的鸟鸣声分类[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1260-1268. |
[13] | 董永峰, 邓亚晗, 董瑶, 王雅琮. 基于深度学习的聚类综述[J]. 《计算机应用》唯一官方网站, 2022, 42(4): 1021-1028. |
[14] | 陈亭秀, 尹建芹. 基于关键帧筛选网络的视听联合动作识别[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 731-735. |
[15] | 孙邱杰, 梁景贵, 李思. 基于BART噪声器的中文语法纠错模型[J]. 《计算机应用》唯一官方网站, 2022, 42(3): 860-866. |
阅读次数 | ||||||
全文 |
|
|||||
摘要 |
|
|||||