Multi-label image classification method based on global and local label relationship

doi:10.11772/j.issn.1001-9081.2021071240

Abstract

Abstract:

Considering the difficulty of modeling the interaction between labels and solidification of global label relationship in multi-label image classification tasks， a new Multiple-Label image classification method based on Global and Local Label Relationship （ML-GLLR） was proposed by combining self-attention mechanism and Knowledge Distillation （KD） method. Firstly， Convolutional Neural Network （CNN）， semantic module and Dual Layer Self-Attention （DLSA） module were used by the Local Label Relationship （LLR） model to model local label relationship. Then， the KD method was used to make LLR learn global label relationship. The experimental results on the public datasets of MicroSoft Common Objects in COntext （MSCOCO） 2014 and PASCAL VOC challenge 2007 （VOC2007） show that， LLR improves the mean Average Precision （mAP） by 0.8 percentage points and 0.6 percentage points compared with Multiple Label classification based on Graph Convolutional Network （ML-GCN） respectively， and the proposed ML-GLLR increases the mAP by 0.2 percentage points and 1.3 percentage points compared with LLR. Experimental results show that， the proposed ML-GLLR can not only model the interaction between labels， but also avoid the problem of global label relationship solidification.

Key words: image classification, self-attention mechanism, deep learning, knowledge distillation, multi-label classification

摘要：

针对多标签图像分类任务中存在的难以对标签间的相互作用建模和全局标签关系固化的问题，结合自注意力机制和知识蒸馏（KD）方法，提出了一种基于全局与局部标签关系的多标签图像分类方法（ML-GLLR）。首先，局部标签关系（LLR）模型使用卷积神经网络（CNN）、语义模块和双层自注意力（DLSA）模块对局部标签关系建模；然后，利用KD方法使LLR学习全局标签关系。在公开数据集MSCOCO2014和VOC2007上进行实验，LLR相较于基于图卷积神经网络多标签图像分类（ML-GCN）方法，在平均精度均值（mAP）上分别提高了0.8个百分点和0.6个百分点，ML-GLLR相较于LLR在mAP上分别进一步提高了0.2个百分点和1.3个百分点。实验结果表明，所提ML-GLLR不仅能对标签间的相互关系进行建模，也能避免全局标签关系固化的问题。

关键词: 图像分类, 自注意力机制, 深度学习, 知识蒸馏, 多标签分类

CLC Number:

TP391.4

Wei REN, Hexiang BAI. Multi-label image classification method based on global and local label relationship[J]. Journal of Computer Applications, 2022, 42(5): 1383-1390.

任炜, 白鹤翔. 基于全局与局部标签关系的多标签图像分类方法[J]. 《计算机应用》唯一官方网站, 2022, 42(5): 1383-1390.

Figures/Tables 12

References 21

1	KRIZHEVSKY A， SUTSKEVER I， HINTON G E. ImageNet classification with deep convolutional neural networks ［C］// Proceedings of the 2012 25th International Conference on Neural Information Processing Systems. Red Hook： Curran Associates Inc.， 2012： 1097-1105.
2	SIMONYAN K， ZISSERMAN A. Very deep convolutional networks for large-scale image recognition ［EB/OL］. ［2021-03-15］. . 10.5244/c.28.6
3	HE K M， ZHANG X Y， REN S Q， et al. Deep residual learning for image recognition ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 770-778. 10.1109/cvpr.2016.90
4	刘尚旺，郜翔.基于深度模型迁移的细粒度图像分类方法［J］.计算机应用，2018，38（8）：2198-2204.
	LIU S W， GAO X. Fine-grained image classification method based on deep model transfer ［J］. Journal of Computer Applications， 2018， 38（8）： 2198-2204.
5	DENG J， DONG W， SOCHER R， et al. ImageNet： a large-scale hierarchical image database ［C］// Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2009： 248-255. 10.1109/cvpr.2009.5206848
6	PHAM H， DAI Z H， XIE Q Z， et al. Meta pseudo labels ［C］// Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2021： 11552-11563. 10.1109/cvpr46437.2021.01139
7	LIN T Y， MAIRE M， BELONGIE S， et al. Microsoft COCO： common objects in context ［C］// Proceedings of the 2014 European Conference on Computer Vision， LNCS 8693. Cham： Springer， 2014： 740-755.
8	ZHU F， LI H S， OUYANG W L， et al. Learning spatial regularization with image-level supervisions for multi-label image classification ［C］// Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2017： 2027-2036. 10.1109/cvpr.2017.219
9	WANG J， YANG Y， MAO J H， et al. CNN-RNN： a unified framework for multi-label image classification ［C］// Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2016： 2285-2294. 10.1109/cvpr.2016.251
10	CHEN S F， CHEN Y C， YEH C K， et al. Order-free RNN with visual attention for multi-label classification ［C］// Proceedings of the 2018 32nd AAAI Conference on Artificial Intelligence. Palo Alto： AAAI Press， 2018： 6714-6721.
11	YAZICI V O， GONZALEZ-GARCIA A， RAMISA A， et al. Orderless recurrent models for multi-label classification ［C］// Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2020： 13437-13446. 10.1109/cvpr42600.2020.01345
12	CHEN Z M， WEI X S， WANG P， et al. Multi-label image recognition with graph convolutional networks ［C］// Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2019： 5172-5181. 10.1109/cvpr.2019.00532
13	EVERINGHAM M， GOOL VAN L， WILLIAMS C K I， et al. The PASCAL Visual Object Classes （VOC） challenge ［J］. International Journal of Computer Vision， 2010， 88（2）：303-338. 10.1007/s11263-009-0275-4
14	XU K， BA J L， KIROS R， et al. Show， attend and tell： neural image caption generation with visual attention ［C］// Proceedings of the 2015 32nd International Conference on Machine Learning. New York： JMLR.org， 2015： 2048-2057.
15	张小川，戴旭尧，刘璐，等.融合多头自注意力机制的中文短文本分类模型［J］.计算机应用，2020，40（12）：3485-3489. 10.11772/j.issn.1001-9081.2020060914
	ZHANG X C， DAI X Y， LIU L， et al. Chinese short text classification model with multi-head self-attention mechanism ［J］. Journal of Computer Applications， 2020， 40（12）： 3485-3489. 10.11772/j.issn.1001-9081.2020060914
16	高钦泉，赵岩，李根，等.基于知识蒸馏的超分辨率卷积神经网络压缩方法［J］.计算机应用，2019，39（10）：2802-2808.
	GAO Q Q， ZHAO Y， LI G， et al. Compression method of super-resolution convolutional neural network based on knowledge distillation ［J］. Journal of Computer Applications， 2019， 39（10）： 2802-2808.
17	邓棋，雷印杰，田锋.用于肺炎图像分类的优化卷积神经网络方法［J］.计算机应用，2020，40（1）：71-76.
	DENG Q， LEI Y J， TIAN F. Optimized convolutional neural network method for classification of pneumonia images ［J］. Journal of Computer Applications， 2020， 40（1）： 71-76.
18	CHEN T S， XU M X， HUI X L， et al. Learning semantic-specific graph representation for multi-label image recognition ［C］// Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision. Piscataway： IEEE， 2019： 522-531. 10.1109/iccv.2019.00061
19	GE W F， YANG S B， YU Y Z. Multi-evidence filtering and fusion for multi-label classification， object detection and semantic segmentation based on weakly supervised learning ［C］// Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Piscataway： IEEE， 2018： 1277-1286. 10.1109/cvpr.2018.00139
20	ZHANG J J， WU Q， SHEN C H， et al. Multilabel image classification with regional latent semantic dependencies ［J］. IEEE Transactions on Multimedia， 2018， 20（10）： 2801-2813. 10.1109/tmm.2018.2812605
21	WEI Y C， XIA W， LIN M， et al. HCP： a flexible CNN framework for multi-label image classification ［J］. IEEE Transactions on Pattern Analysis and Machine Intelligence， 2016， 38（9）： 1901-1907. 10.1109/tpami.2015.2491929

方法	mAP	ALL						Top-3
方法	mAP	CP	CR	CF1	OP	OR	OF1	CP	CR	CF1	OP	OR	OF1
CNN-RNN	61.2	―	―	―	―	―	―	66.0	55.6	60.4	69.2	66.4	67.8
SRN	77.1	81.6	65.4	71.2	82.7	69.9	75.8	85.2	58.8	67.4	87.4	62.5	72.9
Multi-Evidence	―	80.4	70.2	74.9	85.2	72.5	78.4	84.5	62.2	70.6	89.1	64.3	74.7
Res-101	80.1	78.2	71.9	74.9	82.3	75.0	78.5	82.8	63.4	71.8	87.6	65.5	75.0
CNN-LSTM-Att	―	80.9	70.9	75.6	83.7	74.9	79.1	―	―	―	―	―	―
ML-GCN	83.0	85.1	72.0	78.0	85.8	75.4	80.3	89.2	64.1	74.6	90.5	66.5	76.7
SSGRL	83.8	89.9	68.5	76.8	91.3	70.8	79.7	91.9	62.5	72.7	93.8	64.1	76.2
LLR	83.8	86.0	72.6	78.8	86.9	75.8	81.0	89.4	64.6	75.0	90.7	67.0	77.0
ML-GLLR	84.0	86.5	72.4	78.8	87.1	75.8	81.1	90.0	64.0	74.8	91.3	66.7	77.1

方法	mAP	ALL						Top-3
方法	mAP	CP	CR	CF1	OP	OR	OF1	CP	CR	CF1	OP	OR	OF1
CNN-RNN	61.2	―	―	―	―	―	―	66.0	55.6	60.4	69.2	66.4	67.8
SRN	77.1	81.6	65.4	71.2	82.7	69.9	75.8	85.2	58.8	67.4	87.4	62.5	72.9
Multi-Evidence	―	80.4	70.2	74.9	85.2	72.5	78.4	84.5	62.2	70.6	89.1	64.3	74.7
Res-101	80.1	78.2	71.9	74.9	82.3	75.0	78.5	82.8	63.4	71.8	87.6	65.5	75.0
CNN-LSTM-Att	―	80.9	70.9	75.6	83.7	74.9	79.1	―	―	―	―	―	―
ML-GCN	83.0	85.1	72.0	78.0	85.8	75.4	80.3	89.2	64.1	74.6	90.5	66.5	76.7
SSGRL	83.8	89.9	68.5	76.8	91.3	70.8	79.7	91.9	62.5	72.7	93.8	64.1	76.2
LLR	83.8	86.0	72.6	78.8	86.9	75.8	81.0	89.4	64.6	75.0	90.7	67.0	77.0
ML-GLLR	84.0	86.5	72.4	78.8	87.1	75.8	81.1	90.0	64.0	74.8	91.3	66.7	77.1

方法	mAP	各类别AP
方法	mAP	航天	自行车	鸟	船	瓶子	公交车	轿车	猫	椅子	牛	桌子	狗	马	摩托	人	植物	羊	沙发	火车	电视机
CNN-RNN	84.0	96.7	83.1	94.2	92.8	61.2	82.1	89.1	94.2	64.2	83.6	70.0	92.4	91.7	84.2	93.7	59.8	93.2	75.3	99.7	78.6
RLSD	88.5	96.4	92.7	93.8	94.1	71.2	92.5	94.2	95.7	74.3	90.0	74.2	95.4	96.2	92.1	97.9	66.9	93.5	73.7	97.5	87.6
VGG	89.7	98.9	95.0	96.8	95.4	69.7	90.4	93.5	96.0	74.2	86.6	87.8	96.0	96.3	93.1	97.2	70.0	92.1	80.3	98.1	87.0
HCP	90.9	98.6	97.1	98.0	95.6	75.3	94.7	95.8	97.3	73.1	90.2	80.0	97.3	96.1	94.9	96.3	78.3	94.7	76.2	97.9	91.5
Res-101	91.9	99.1	97.6	96.5	95.1	74.2	91.3	96.0	95.8	75.5	92.2	88.5	96.2	96.6	94.3	98.5	83.2	94.8	84.7	98.6	90.1
ML-GCN	94.0	99.5	98.5	98.6	98.1	80.8	94.6	97.2	98.2	82.3	95.7	86.4	98.2	98.4	96.7	99.0	84.7	96.7	84.3	98.9	93.7
SSGRL	95.0	99.7	98.4	98.0	97.6	85.7	96.2	98.2	98.8	82.0	98.1	89.7	98.8	98.7	97.0	99.0	86.9	98.1	85.8	99.0	93.7
LLR	94.6	99.4	97.5	97.9	97.1	83.9	95.2	97.7	98.0	83.6	95.4	90.0	97.7	98.0	96.3	99.0	86.8	96.5	88.4	98.7	94.4
ML-GLLR	95.9	99.8	98.4	98.2	98.2	86.2	97.6	98.2	98.8	85.7	97.2	92.6	98.7	98.9	97.1	99.2	89.2	98.3	90.7	99.3	96.1

方法	mAP	各类别AP
方法	mAP	航天	自行车	鸟	船	瓶子	公交车	轿车	猫	椅子	牛	桌子	狗	马	摩托	人	植物	羊	沙发	火车	电视机
CNN-RNN	84.0	96.7	83.1	94.2	92.8	61.2	82.1	89.1	94.2	64.2	83.6	70.0	92.4	91.7	84.2	93.7	59.8	93.2	75.3	99.7	78.6
RLSD	88.5	96.4	92.7	93.8	94.1	71.2	92.5	94.2	95.7	74.3	90.0	74.2	95.4	96.2	92.1	97.9	66.9	93.5	73.7	97.5	87.6
VGG	89.7	98.9	95.0	96.8	95.4	69.7	90.4	93.5	96.0	74.2	86.6	87.8	96.0	96.3	93.1	97.2	70.0	92.1	80.3	98.1	87.0
HCP	90.9	98.6	97.1	98.0	95.6	75.3	94.7	95.8	97.3	73.1	90.2	80.0	97.3	96.1	94.9	96.3	78.3	94.7	76.2	97.9	91.5
Res-101	91.9	99.1	97.6	96.5	95.1	74.2	91.3	96.0	95.8	75.5	92.2	88.5	96.2	96.6	94.3	98.5	83.2	94.8	84.7	98.6	90.1
ML-GCN	94.0	99.5	98.5	98.6	98.1	80.8	94.6	97.2	98.2	82.3	95.7	86.4	98.2	98.4	96.7	99.0	84.7	96.7	84.3	98.9	93.7
SSGRL	95.0	99.7	98.4	98.0	97.6	85.7	96.2	98.2	98.8	82.0	98.1	89.7	98.8	98.7	97.0	99.0	86.9	98.1	85.8	99.0	93.7
LLR	94.6	99.4	97.5	97.9	97.1	83.9	95.2	97.7	98.0	83.6	95.4	90.0	97.7	98.0	96.3	99.0	86.8	96.5	88.4	98.7	94.4
ML-GLLR	95.9	99.8	98.4	98.2	98.2	86.2	97.6	98.2	98.8	85.7	97.2	92.6	98.7	98.9	97.1	99.2	89.2	98.3	90.7	99.3	96.1

方法	MSCOCO2014					VOC2007
方法	mAP	T-OF1	T-CF1	A-OF1	A-CF1	mAP	T-OF1	T-CF1	A-OF1	A-CF1
Res-101	80.1	75.0	71.8	78.5	74.9	91.9	87.7	85.5	87.7	85.5
LLR（无DLSA模块）	81.4	75.6	72.9	79.2	76.5	92.7	89.3	86.9	89.3	86.9
LLR（无语义模块）	82.1	76.0	72.8	79.6	77.0	93.6	89.9	87.7	89.9	87.7
LLR	83.8	77.0	75.0	81.0	78.8	94.6	90.5	88.6	90.4	88.5
ML-GLLR	84.0	77.1	74.8	81.1	78.8	95.9	90.9	89.6	90.9	89.5