Image automatic annotation based on transfer learning and multi-label smoothing strategy

doi:10.11772/j.issn.1001-9081.2018041349

Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (11): 3199-3203.DOI: 10.11772/j.issn.1001-9081.2018041349

Previous Articles Next Articles

Image automatic annotation based on transfer learning and multi-label smoothing strategy

WANG Peng^1,2, ZHANG Aofan¹, WANG Liqin^1,2, DONG Yongfeng^1,2

1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;
2. Hebei Province Key Laboratory of Big Data Calculation(Hebei University of Technology), Tianjin 300401, China

Received:2018-04-23 Revised:2018-06-15 Online:2018-11-10 Published:2018-11-10
Supported by:
This work is partially supported by the Basic Research Plan Major Project of Hebei Province (F2016202144), the Application Basis and Advanced Technology Research Plan of Tianjin (15JCTPJC62000, 16JCYBJC15600).

基于迁移学习与多标签平滑策略的图像自动标注

汪鹏^1,2, 张奥帆¹, 王利琴^1,2, 董永峰^1,2

1. 河北工业大学人工智能与数据科学学院, 天津 300401;
2. 河北省大数据计算重点实验室(河北工业大学), 天津 300401

通讯作者: 王利琴
作者简介:汪鹏(1978-),男,河北邯郸人,副教授,博士,主要研究方向:计算机软件设计、数据挖掘;张奥帆(1992-),男,河北石家庄人,硕士研究生,主要研究方向:计算机视觉、机器学习;王利琴(1980-),女,河北张北人,实验师,博士,主要研究方向:数据挖掘、机器学习;董永峰(1977-),男,河北保定人,教授,博士,主要研究方向:机器学习、大数据分析。
基金资助:
河北省基础研究计划重点项目（F2016202144）；天津市应用基础与前沿技术研究计划项目（15JCTPJC62000，16JCYBJC15600）。

Abstract

Abstract: In order to solve the problem of imbalance of label distribution in an image dataset and improve the annotation performance of rare labels, a Multi Label Smoothing Unit (MLSU) based on label smoothing strategy was proposed. High-frequency labels in the dataset were automatically smoothed during training the network model, so that the network appropriately raised the output value of low-frequency labels, thus, the annotation performance of low-frequency labels was improved. Focusing on the problem that the number of images was insufficient in the dataset for image annotation, a Convolutional Neural Network (CNN) model based on transfer learning was proposed. Firstly, the deep convolutional neural network was pre-trained by using the large public image datasets on the Internet. Then, the target dataset was used to fine-tune the network parameters, and a Convolutional Neural Network model using Multi-Label Smoothing Unit (CNN-MLSU) was established. Experiments were carried out on the benchmark image annotation datasets Corel5K and the IAPR TC-12 respectively. The experimental results show that the average accuracy and average recall of the proposed method are 5 percentage points and 8 percentage points higher than those of the Convolutional Neural Network Regression (CNN-R) on the Corel5K dataset. And on the IAPR TC-12 dataset, the average recall of the proposed method has increased by 6 percentage points compared with the Two-Pass K-Nearest Neighbor (2PKNN_ML). The results show that the CNN-MLSU method based on transfer learning can effectively prevent the over-fitting of network and improve the annotation performance of low-frequency labels.

Key words: automatic image annotation, multi-label smoothing, transfer learning, Convolutional Neural Network (CNN), image retrieval

摘要： 针对图像标注数据集标签分布不平衡问题，提出了基于标签平滑策略的多标签平滑单元（MLSU）。MLSU在网络模型训练过程中自动平滑数据集中的高频标签，使网络适当提升了低频标签的输出值，从而提升了低频标注词的标注性能。为解决图像标注数据集样本数量不足造成网络过拟合的问题，提出了基于迁移学习的卷积神经网络（CNN）模型。首先利用互联网上的大型公共图像数据集对深度网络进行预训练，然后利用目标数据集对网络参数进行微调，构建了一个多标签平滑卷积神经网络模型（CNN-MLSU）。分别在Corel5K和IAPR TC-12图像标注数据集上进行实验，在Corel5K数据集上，CNN-MLSU较卷积神经网络回归方法（CNN-R）的平均准确率与平均召回率分别提升了5个百分点和8个百分点；在IAPR TC-12数据集上，CNN-MLSU较两场K最邻近模型（2PKNN_ML）的平均召回率提升了6个百分点。实验结果表明，基于迁移学习的CNN-MLSU方法能有效地预防网络过拟合，同时提升了低频词的标注效果。

关键词: 图像自动标注, 多标签平滑, 迁移学习, 卷积神经网络, 图像检索

CLC Number:

WANG Peng, ZHANG Aofan, WANG Liqin, DONG Yongfeng. Image automatic annotation based on transfer learning and multi-label smoothing strategy[J]. Journal of Computer Applications, 2018, 38(11): 3199-3203.

汪鹏, 张奥帆, 王利琴, 董永峰. 基于迁移学习与多标签平滑策略的图像自动标注[J]. 计算机应用, 2018, 38(11): 3199-3203.

References

[1] WU J, SHEN H, LI Y D, et al. Learning a hybrid similarity measure for image retrieval[J]. Pattern Recognition, 2013, 46(11):2927-2939.
[2] 臧淼. 图像自动标注关键技术研究[D].北京:北京邮电大学,2017:3-20.(ZANG M. Research on key technology of image automatic annotation[D].Beijing:Beijing University of Posts and Telecommunications,2017:3-20.)
[3] GUILLAUMIN M, MENSINK T, VERBEEK J, et al. TagProp:Discriminative metric learning in nearest neighbor models for image auto-annotation[C]//Proceedings of the 12th IEEE International Conference on Computer Vision. Piscataway, NJ:IEEE, 2009:309-316.
[4] JEON J, LAVRENKO V, MANMATHA R. Automatic image annotation and retrieval using cross-media relevance models[C]//Proceedings of the 26th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York:ACM, 2003:119-126.
[5] MORAN S, LAVRENKO V. A sparse kernel relevance model for automatic image annotation[J].Journal of Multimedia Information Retrieval,2014, 3(4):209-229.
[6] MAKADIA A, PAVLOVIC V, KUMAR S. Baselines for image annotation[J]. International Journal of Computer Vision, 2010, 90(1):88-105.
[7] VERMA Y,JAWAHAR C V. Image annotation using metric learning in semantic neighborhoods[M]//ECCV'12:Proceedings of the 12th European Conference on Computer Vision. Berlin:Springer, 2012:836-849.
[8] VERMA Y, JAWAHAR C V. Image annotation by propagating labels from semantic neighborhoods[J]. International Journal of Computer Vision, 2017, 121(1):126-148.
[9] KASHANI M M, AMIRI S H. Leveraging deep learning representation for search-based image annotation[C]//Proceedings of 2017 Artificial Intelligence and Signal Processing Conference. Piscataway, NJ:IEEE, 2017:156-161.
[10] 黎健成,袁春,宋友.基于卷积神经网络的多标签图像自动标注[J].计算机科学, 2016, 43(7):41-45.(LI J C,YUAN C,SONG Y. Multi-label image annotation based on convolutional neural network[J]. Computer Science, 2016, 43(7):41-45.)
[11] HOA M L, NGUYEN T, DUNG N. Fully automated multi-label image annotation by convolution neural network and adaptive thresholding[C]//Proceedings of the 7th Symposium on Information and Communication Technology. New York:ACM, 2016:323-330.
[12] MURTHY V N, MAJI S, MANMATHA R. Automatic image annotation using deep learning representations[C]//Proceedings of the 5th ACM on International Conference on Multimedia Retrieval. New York:ACM, 2015:603-606.
[13] 高耀东,侯凌燕,杨大利.基于多标签学习的卷积神经网络的图像标注方法[J].计算机应用, 2017, 37(1):228-232.(GAO Y D,HOU L Y,YANG D L. Automatic image annotation method using multi-label learning convolutional neural network[J].Journal of Computer Applications,2017, 37(1):228-232.)
[14] KALAYEH M M, IDREES H,SHAH M. NMF-KNN:Image annotation using weighted multi-view non-negative matrix factorization[C]//Proceedings of the 27th IEEE International Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2014:184-191.
[15] PAN S J, YANG Q. A survey on transfer learning[J]. IEEE Transactions on Knowledge & Data Engineering, 2010, 22(10):1345-1359.
[16] 庄福振,罗平,何清,等.迁移学习研究进展[J].软件学报,2015, 26(1): 26-39.(ZHUANG F Z,LUO P,HE Q, et al. Survey on transfer learning research[J]. Journal of Software,2015, 26(1):26-39.)
[17] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[C]// Proceedings of 26th International Conference on Neural Information Processing Systems. Cambridge, MA: MIT Press, 2012: 1097-1105.
[18] 宋光慧.基于迁移学习与深度卷积特征的图像标注方法研究[D].杭州: 浙江大学, 2017: 56-61.(SONG G H. Image annotation method based on transfer learning and deep convolutional feature[D].Hangzhou: Zhejiang University,2017: 56-61.)

Image automatic annotation based on transfer learning and multi-label smoothing strategy

基于迁移学习与多标签平滑策略的图像自动标注

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	WANG Hebing, ZHANG Chunmei. Facial landmark detection based on ResNeXt with asymmetric convolution and squeeze excitation [J]. Journal of Computer Applications, 2021, 41(9): 2741-2747.
[2]	SONG Zhongshan, LIANG Jiarui, ZHENG Lu, LIU Zhenyu, TIE Jun. Remote sensing scene classification based on bidirectional gated scale feature fusion [J]. Journal of Computer Applications, 2021, 41(9): 2726-2735.
[3]	LI Kangkang, ZHANG Jing. Multi-layer encoding and decoding model for image captioning based on attention mechanism [J]. Journal of Computer Applications, 2021, 41(9): 2504-2509.
[4]	ZHANG Yongbin, CHANG Wenxin, SUN Lianshan, ZHANG Hang. Detection method of domains generated by dictionary-based domain generation algorithm [J]. Journal of Computer Applications, 2021, 41(9): 2609-2614.
[5]	ZHAO Hong, KONG Dongyi. Chinese description of image content based on fusion of image feature attention and adaptive attention [J]. Journal of Computer Applications, 2021, 41(9): 2496-2503.
[6]	XU Jianglang, LI Linyan, WAN Xinjun, HU Fuyuan. Indoor scene recognition method combined with object detection [J]. Journal of Computer Applications, 2021, 41(9): 2720-2725.
[7]	CAO Yuhong, XU Hai, LIU Sun'ao, WANG Zixiao, LI Hongliang. Review of deep learning-based medical image segmentation [J]. Journal of Computer Applications, 2021, 41(8): 2273-2287.
[8]	QIN Binbin, PENG Liangkang, LU Xiangming, QIAN Jiangbo. Research progress on driver distracted driving detection [J]. Journal of Computer Applications, 2021, 41(8): 2330-2337.
[9]	HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm [J]. Journal of Computer Applications, 2021, 41(8): 2258-2264.
[10]	ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329.
[11]	YANG Su, OUYANG Zhi, DU Nisuo. Unsupervised parallel hash image retrieval based on correlation distance [J]. Journal of Computer Applications, 2021, 41(7): 1902-1907.
[12]	TAN Daoqiang, ZENG Cheng, QIAO Jinxia, ZHANG Jun. Shadow detection method based on hybrid attention model [J]. Journal of Computer Applications, 2021, 41(7): 2076-2081.
[13]	GAO Qinquan, HUANG Bingcheng, LIU Wenzhe, TONG Tong. Bamboo strip surface defect detection method based on improved CenterNet [J]. Journal of Computer Applications, 2021, 41(7): 1933-1938.
[14]	WU Guangli, LI Leiting, GUO Zhenzhou, WANG Chengxiang. Video summarization generation model based on improved bi-directional long short-term memory network [J]. Journal of Computer Applications, 2021, 41(7): 1908-1914.
[15]	ZHANG Hui, ZHANG Nana, HUANG Jun. Multi-angle head pose estimation method based on optimized LeNet-5 network [J]. Journal of Computer Applications, 2021, 41(6): 1667-1672.