计算机应用 ›› 2018, Vol. 38 ›› Issue (11): 3199-3203.DOI: 10.11772/j.issn.1001-9081.2018041349

• 第七届中国数据挖掘会议(CCDM 2018) • 上一篇    下一篇

基于迁移学习与多标签平滑策略的图像自动标注

汪鹏1,2, 张奥帆1, 王利琴1,2, 董永峰1,2   

  1. 1. 河北工业大学 人工智能与数据科学学院, 天津 300401;
    2. 河北省大数据计算重点实验室(河北工业大学), 天津 300401
  • 收稿日期:2018-04-23 修回日期:2018-06-15 出版日期:2018-11-10 发布日期:2018-11-10
  • 通讯作者: 王利琴
  • 作者简介:汪鹏(1978-),男,河北邯郸人,副教授,博士,主要研究方向:计算机软件设计、数据挖掘;张奥帆(1992-),男,河北石家庄人,硕士研究生,主要研究方向:计算机视觉、机器学习;王利琴(1980-),女,河北张北人,实验师,博士,主要研究方向:数据挖掘、机器学习;董永峰(1977-),男,河北保定人,教授,博士,主要研究方向:机器学习、大数据分析。
  • 基金资助:
    河北省基础研究计划重点项目(F2016202144);天津市应用基础与前沿技术研究计划项目(15JCTPJC62000,16JCYBJC15600)。

Image automatic annotation based on transfer learning and multi-label smoothing strategy

WANG Peng1,2, ZHANG Aofan1, WANG Liqin1,2, DONG Yongfeng1,2   

  1. 1. School of Artificial Intelligence, Hebei University of Technology, Tianjin 300401, China;
    2. Hebei Province Key Laboratory of Big Data Calculation(Hebei University of Technology), Tianjin 300401, China
  • Received:2018-04-23 Revised:2018-06-15 Online:2018-11-10 Published:2018-11-10
  • Supported by:
    This work is partially supported by the Basic Research Plan Major Project of Hebei Province (F2016202144), the Application Basis and Advanced Technology Research Plan of Tianjin (15JCTPJC62000, 16JCYBJC15600).

摘要: 针对图像标注数据集标签分布不平衡问题,提出了基于标签平滑策略的多标签平滑单元(MLSU)。MLSU在网络模型训练过程中自动平滑数据集中的高频标签,使网络适当提升了低频标签的输出值,从而提升了低频标注词的标注性能。为解决图像标注数据集样本数量不足造成网络过拟合的问题,提出了基于迁移学习的卷积神经网络(CNN)模型。首先利用互联网上的大型公共图像数据集对深度网络进行预训练,然后利用目标数据集对网络参数进行微调,构建了一个多标签平滑卷积神经网络模型(CNN-MLSU)。分别在Corel5K和IAPR TC-12图像标注数据集上进行实验,在Corel5K数据集上,CNN-MLSU较卷积神经网络回归方法(CNN-R)的平均准确率与平均召回率分别提升了5个百分点和8个百分点;在IAPR TC-12数据集上,CNN-MLSU较两场K最邻近模型(2PKNN_ML)的平均召回率提升了6个百分点。实验结果表明,基于迁移学习的CNN-MLSU方法能有效地预防网络过拟合,同时提升了低频词的标注效果。

关键词: 图像自动标注, 多标签平滑, 迁移学习, 卷积神经网络, 图像检索

Abstract: In order to solve the problem of imbalance of label distribution in an image dataset and improve the annotation performance of rare labels, a Multi Label Smoothing Unit (MLSU) based on label smoothing strategy was proposed. High-frequency labels in the dataset were automatically smoothed during training the network model, so that the network appropriately raised the output value of low-frequency labels, thus, the annotation performance of low-frequency labels was improved. Focusing on the problem that the number of images was insufficient in the dataset for image annotation, a Convolutional Neural Network (CNN) model based on transfer learning was proposed. Firstly, the deep convolutional neural network was pre-trained by using the large public image datasets on the Internet. Then, the target dataset was used to fine-tune the network parameters, and a Convolutional Neural Network model using Multi-Label Smoothing Unit (CNN-MLSU) was established. Experiments were carried out on the benchmark image annotation datasets Corel5K and the IAPR TC-12 respectively. The experimental results show that the average accuracy and average recall of the proposed method are 5 percentage points and 8 percentage points higher than those of the Convolutional Neural Network Regression (CNN-R) on the Corel5K dataset. And on the IAPR TC-12 dataset, the average recall of the proposed method has increased by 6 percentage points compared with the Two-Pass K-Nearest Neighbor (2PKNN_ML). The results show that the CNN-MLSU method based on transfer learning can effectively prevent the over-fitting of network and improve the annotation performance of low-frequency labels.

Key words: automatic image annotation, multi-label smoothing, transfer learning, Convolutional Neural Network (CNN), image retrieval

中图分类号: