计算机应用 ›› 2017, Vol. 37 ›› Issue (1): 228-232.DOI: 10.11772/j.issn.1001-9081.2017.01.0228

• 人工智能 • 上一篇    下一篇

基于多标签学习的卷积神经网络的图像标注方法

高耀东, 侯凌燕, 杨大利   

  1. 北京信息科技大学 计算机学院, 北京 100101
  • 收稿日期:2016-06-15 修回日期:2016-09-12 出版日期:2017-01-10 发布日期:2017-01-09
  • 通讯作者: 高耀东
  • 作者简介:高耀东(1991-),男,安徽合肥人,硕士研究生,主要研究方向:机器学习、模式识别;侯凌燕(1964-),女,湖南长沙人,副教授,硕士,主要研究方向:多媒体技术、模式识别;杨大利(1963-),男,河北阳原人,副教授,博士,主要研究方向:模式识别、信号增强。
  • 基金资助:
    “十二五”国家科技支撑计划项目(2015BAK12B00)。

Automatic image annotation method using multi-label learning convolutional neural network

GAO Yaodong, HOU Lingyan, YANG Dali   

  1. College of Computer, Beijing Information Science and Technology University, Beijing 100101, China
  • Received:2016-06-15 Revised:2016-09-12 Online:2017-01-10 Published:2017-01-09
  • Supported by:
    This work is supported by the Key Projects in the National Science and Technology Pillar Program during the Twelfth Five-year Plan Period of China (2015BAK12B00).

摘要: 针对图像自动标注中因人工选择特征而导致信息缺失的缺点,提出使用卷积神经网络对样本进行自主特征学习。为了适应图像自动标注的多标签学习的特点以及提高对低频词汇的召回率,首先改进卷积神经网络的损失函数,构建一个多标签学习的卷积神经网络(CNN-MLL)模型,然后利用图像标注词间的相关性对网络模型输出结果进行改善。通过在IAPR TC-12标准图像标注数据集上对比了其他传统方法,实验得出,基于采用均方误差函数的卷积神经网络(CNN-MSE)的方法较支持向量机(SVM)方法在平均召回率上提升了12.9%,较反向传播神经网络(BPNN)方法在平均准确率上提升了37.9%;基于标注结果改善的CNN-MLL方法较普通卷积神经网络的平均准确率和平均召回率分别提升了23%和20%。实验结果表明基于标注结果改善的CNN-MLL方法能有效地避免因人工选择特征造成的信息缺失同时增加了对低频词汇的召回率。

关键词: 图像自动标注, 多标签学习, 卷积神经网络, 损失函数

Abstract: Focusing on the shortcoming of the automatic image annotation, the lack of information caused by artificially selecting features, convolutional neural network was used to learn the characteristics of samples. Firstly, in order to adapt to the characteristics of multi label learning of automatic image annotation and increase the recall rate of the low frequency words, the loss function of convolutional neural network was improved and a Convolutional Neural Network of Multi-Label Learning (CNN-MLL) model was constructed. Secondly, the correlation between the image annotation words was used to improve the output of the network model. Compared with other traditional methods on the Technical Committee 12 of the International Association for Pattern Recognition (IAPR TC-12) benchmark image annotation database, the experimental result show that the Convolutional Neural Network using Mean Square Error function (CNN-MSE) method achieves the average recall rate of 12.9% more than the Support Vector Machine (SVM) method, the average accuracy of 37.9% more than the Back Propagation Neural Network (BPNN) method. And the average accuracy rate and average recall rate of marked results improved CNN-MLL method is 23% and 20% higher than those of the traditional CNN. The results show that the marked results improved CNN-MLL method can effectively avoid the information loss caused by the artificially selecting features, and increase the recall rate of the low frequency words.

Key words: automatic image annotation, multi-label learning, Convolution Neural Network (CNN), loss function

中图分类号: