基于多标签学习的卷积神经网络的图像标注方法

doi:10.11772/j.issn.1001-9081.2017.01.0228

计算机应用 ›› 2017, Vol. 37 ›› Issue (1): 228-232.DOI: 10.11772/j.issn.1001-9081.2017.01.0228

基于多标签学习的卷积神经网络的图像标注方法

高耀东, 侯凌燕, 杨大利

北京信息科技大学计算机学院, 北京 100101

收稿日期:2016-06-15 修回日期:2016-09-12 发布日期:2017-01-09 出版日期:2017-01-10
通讯作者: 高耀东
作者简介:高耀东(1991-),男,安徽合肥人,硕士研究生,主要研究方向:机器学习、模式识别;侯凌燕(1964-),女,湖南长沙人,副教授,硕士,主要研究方向:多媒体技术、模式识别;杨大利(1963-),男,河北阳原人,副教授,博士,主要研究方向:模式识别、信号增强。
基金资助:
“十二五”国家科技支撑计划项目（2015BAK12B00）。

Automatic image annotation method using multi-label learning convolutional neural network

GAO Yaodong, HOU Lingyan, YANG Dali

College of Computer, Beijing Information Science and Technology University, Beijing 100101, China

Received:2016-06-15 Revised:2016-09-12 Online:2017-01-09 Published:2017-01-10
Supported by:
This work is supported by the Key Projects in the National Science and Technology Pillar Program during the Twelfth Five-year Plan Period of China (2015BAK12B00).

摘要/Abstract

摘要： 针对图像自动标注中因人工选择特征而导致信息缺失的缺点，提出使用卷积神经网络对样本进行自主特征学习。为了适应图像自动标注的多标签学习的特点以及提高对低频词汇的召回率，首先改进卷积神经网络的损失函数，构建一个多标签学习的卷积神经网络（CNN-MLL）模型，然后利用图像标注词间的相关性对网络模型输出结果进行改善。通过在IAPR TC-12标准图像标注数据集上对比了其他传统方法，实验得出，基于采用均方误差函数的卷积神经网络（CNN-MSE）的方法较支持向量机（SVM）方法在平均召回率上提升了12.9%，较反向传播神经网络（BPNN）方法在平均准确率上提升了37.9%；基于标注结果改善的CNN-MLL方法较普通卷积神经网络的平均准确率和平均召回率分别提升了23%和20%。实验结果表明基于标注结果改善的CNN-MLL方法能有效地避免因人工选择特征造成的信息缺失同时增加了对低频词汇的召回率。

关键词: 图像自动标注, 多标签学习, 卷积神经网络, 损失函数

Abstract: Focusing on the shortcoming of the automatic image annotation, the lack of information caused by artificially selecting features, convolutional neural network was used to learn the characteristics of samples. Firstly, in order to adapt to the characteristics of multi label learning of automatic image annotation and increase the recall rate of the low frequency words, the loss function of convolutional neural network was improved and a Convolutional Neural Network of Multi-Label Learning (CNN-MLL) model was constructed. Secondly, the correlation between the image annotation words was used to improve the output of the network model. Compared with other traditional methods on the Technical Committee 12 of the International Association for Pattern Recognition (IAPR TC-12) benchmark image annotation database, the experimental result show that the Convolutional Neural Network using Mean Square Error function (CNN-MSE) method achieves the average recall rate of 12.9% more than the Support Vector Machine (SVM) method, the average accuracy of 37.9% more than the Back Propagation Neural Network (BPNN) method. And the average accuracy rate and average recall rate of marked results improved CNN-MLL method is 23% and 20% higher than those of the traditional CNN. The results show that the marked results improved CNN-MLL method can effectively avoid the information loss caused by the artificially selecting features, and increase the recall rate of the low frequency words.

Key words: automatic image annotation, multi-label learning, Convolution Neural Network (CNN), loss function

中图分类号:

高耀东, 侯凌燕, 杨大利. 基于多标签学习的卷积神经网络的图像标注方法[J]. 计算机应用, 2017, 37(1): 228-232.

GAO Yaodong, HOU Lingyan, YANG Dali. Automatic image annotation method using multi-label learning convolutional neural network[J]. Journal of Computer Applications, 2017, 37(1): 228-232.

参考文献

[1] 许红涛,周向东,向宇,等.一种自适应的Web图像语义自动标注方法[J].软件学报,2010,21(9):2183-2195.(XU H T, ZHOU X D, XIANG Y, et al. Adaptive model for Web image semantic automatic annotation[J]. Journal of Software, 2010, 21(9):2186-2195.)
[2] YANG C B, DONG M, HUA J. Region-based image annotation using asymmetrical support vector machine-based multiple instance learning[C]//Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2006:2057-2063.
[3] GAO Y, FAN J, XUE X, et al. Automatic image annotation by incorporating feature hierarchy and boosting to scale up SVM classifiers[C]//Proceedings of the 2006 ACM International Conference on Multimedia. New York:ACM, 2006:901-910.
[4] MURTHY V N, CAN E F, MANMATHA R. A hybrid model for automatic image annotation[C]//Proceedings of the 2014 ACM International Conference on Multimedia Retrieval. New York:ACM, 2014:369.
[5] 吴伟,聂建云,高光来.一种基于改进的支持向量机多分类器图像标注方法[J].计算机工程与科学,2015,37(7):1338-1343.(WU W, NIE J Y, GAO G L. Improved SVM multiple classifiers for image annotation[J]. Computer Engineering & Science, 2015, 37(7):1338-1343.)
[6] MORAN S, LAVRENKO V. Sparse kernel learning for image annotation[C]//Proceedings of the 2014 International Conference on Multimedia Retrieval. New York:ACM, 2014:113.
[7] VERMA Y, JAWAHAR C V. Image annotation using metric learning in semantic neighbourhoods[M]//ECCV'12:Proceedings of the 12th European Conference on Computer Vision. Berlin:Springer, 2012:836-849.
[8] HOU J, CHEN Z, QIN X, et al. Automatic image search based on improved feature descriptors and decision tree[J]. Integrated Computer Aided Engineering, 2011, 18(2):167-180.
[9] 蒋黎星,侯进.基于集成分类算法的自动图像标注[J].自动化学报,2012,38(8):1257-1262.(JIANG L X, HOU J. Image annotation using the ensemble learning[J]. Acta Automatica Sinica, 2012, 38(8):1257-1262.)
[10] ZHANG M L, ZHOU Z H. Multilabel neural networks with applications to functional genomics and text categorization[J]. IEEE Transactions on Knowledge & Data Engineering, 2006, 18(10):1338-1351.
[11] READ J, PEREZCRUZ F. Deep learning for multi-label classification[J]. Machine Learning, 2014, 85(3):333-359.
[12] WU F, WANG Z H, ZHANG Z F, et al. Weakly semi-supervised deep learning for multi-label image annotation[J]. IEEE Transactions on Big Data, 2015, 1(3):109-122.
[13] DUYGULU P, BARNARD K, DE FREITAS J F G, et al. Object recognition as machine translation:learning a lexicon for a fixed image vocabulary[C]//ECCV 2002:Proceedings of the 7th European Conference on Computer Vision. Berlin:Springer, 2002:97-112.
[14] BALLAN L, URICCHIO T, SEIDENARI L, et al. A cross-media model for automatic image annotation[C]//Proceedings of the 2014 International Conference on Multimedia Retrieval. New York:ACM, 2014:73.
[15] WANG C, BLEI D, LI F F. Simultaneous image classification and annotation[C]//Proceedings of the 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Washington, DC:IEEE Computer Society, 2009:1903-1910.
[16] 李志欣,施智平,李志清,等.融合语义主题的图像自动标注[J].软件学报,2011,22(4):801-812.(LI Z X, SHI Z P, LI Z Q, et al. Automatic image annotation by fusing semantic topics[J]. Journal of Software, 2011, 22(4):801-812.)
[17] 刘凯,张立民,孙永威,等.利用深度玻尔兹曼机与典型相关分析的自动图像标注算法[J].西安交通大学学报,2015,49(6):33-38.(LIU K, ZHANG L M, SUN Y W, et al. An automatic image algorithm using deep Boltzmann machine and canonical correlation analysis[J]. Journal of Xi'an Jiaotong University, 2015, 49(6):33-38.)
[18] FUKUSHIMA K, MIYAKE S. Neocognitron:a new algorithm for pattern recognition tolerant of deformations and shifts in position[J]. Pattern Recognition, 1982, 15(6):455-469.
[19] LE CUN Y, BOSER B, DENKER J S, et al. Handwritten digit recognition with a back-propagation network[M]//Advances in Neural Information Processing Systems. San Francisco, CA:Morgan Kaufmann Publishers, 1990:396-404.
[20] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. ImageNet classification with deep convolutional neural networks[EB/OL].[2016-04-10] . https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf.
[21] HE K, ZHANG X, REN S, et al. Spatial pyramid pooling in deep convolutional networks for visual recognition[C]//ECCV 2014:Proceedings of the 13th European Conference on Computer Vision. Berlin:Springer, 2014:346-361.
[22] SZEGEDY C, LIU W, JIA Y, et al. Going deeper with convolutions[C]//Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Piscataway, NJ:IEEE, 2015:1-9.
[23] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers:surpassing human-level performance on ImageNet classification[C]//Proceedings of the 2015 IEEE International Conference on Computer Vision. Washington, DC:IEEE Computer Society, 2015:1026-1034.
[24] IOFFE S, SZEGEDY C. Batch normalization:accelerating deep network training by reducing internal covariate shift[C]//Proceedings of the 32nd International Conference on Machine Learning. Washington, DC:IEEE Computer Society, 2015:448-456.
[25] JIN C, JIN S W. Image distance metric learning based on neighborhood sets for automatic image annotation[J]. Journal of Visual Communication and Image Representation, 2016, 34:167-175.(无期)

基于多标签学习的卷积神经网络的图像标注方法

Automatic image annotation method using multi-label learning convolutional neural network

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	李云, 王富铕, 井佩光, 王粟, 肖澳. 基于不确定度感知的帧关联短视频事件检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2903-2910.
[2]	秦璟, 秦志光, 李发礼, 彭悦恒. 基于概率稀疏自注意力神经网络的重性抑郁疾患诊断[J]. 《计算机应用》唯一官方网站, 2024, 44(9): 2970-2974.
[3]	赵宇博, 张丽萍, 闫盛, 侯敏, 高茂. 基于改进分段卷积神经网络和知识蒸馏的学科知识实体间关系抽取[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2421-2429.
[4]	张春雪, 仇丽青, 孙承爱, 荆彩霞. 基于两阶段动态兴趣识别的购买行为预测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2365-2371.
[5]	李钟华, 白云起, 王雪津, 黄雷雷, 林初俊, 廖诗宇. 基于图像增强的低照度人脸检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2588-2594.
[6]	邓凯丽, 魏伟波, 潘振宽. 改进掩码自编码器的工业缺陷检测方法[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2595-2603.
[7]	陈虹, 齐兵, 金海波, 武聪, 张立昂. 融合1D-CNN与BiGRU的类不平衡流量异常检测[J]. 《计算机应用》唯一官方网站, 2024, 44(8): 2493-2499.
[8]	孔哲, 李寒, 甘少伟, 孔明茹, 何冰涛, 郭子钰, 金督程, 邱兆文. 基于非对称多解码器和注意力模块的三维肾脏影像结构分割模型[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2216-2224.
[9]	高阳峄, 雷涛, 杜晓刚, 李岁永, 王营博, 闵重丹. 基于像素距离图和四维动态卷积网络的密集人群计数与定位方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 2233-2242.
[10]	王东炜, 刘柏辰, 韩志, 王艳美, 唐延东. 基于低秩分解和向量量化的深度网络压缩方法[J]. 《计算机应用》唯一官方网站, 2024, 44(7): 1987-1994.
[11]	黄梦源, 常侃, 凌铭阳, 韦新杰, 覃团发. 基于层间引导的低光照图像渐进增强算法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1911-1919.
[12]	李健京, 李贯峰, 秦飞舟, 李卫军. 基于不确定知识图谱嵌入的多关系近似推理模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1751-1759.
[13]	姚迅, 秦忠正, 杨捷. 生成式标签对抗的文本分类模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1781-1785.
[14]	程小辉, 黄云天, 张瑞芳. 基于多尺度和加权坐标注意力的轻量化红外道路场景检测模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1927-1934.
[15]	沈君凤, 周星辰, 汤灿. 基于改进的提示学习方法的双通道情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1796-1806.