计算机应用 ›› 0, Vol. ›› Issue (): 1587-1592.DOI: 10.11772/j.issn.1001-9081.2019111993

• 人工智能 • 上一篇    下一篇

融合阈值寻优的卷积神经网络在图像标注中的应用

曹建芳1,2, 赵爱迪1, 张自邦1   

  1. 1.太原科技大学 计算机科学与技术学院,太原 030024
    2.忻州师范学院 计算机系,山西 忻州 034000
  • 收稿日期:2019-11-25 修回日期:2020-01-05 发布日期:2020-06-18 出版日期:2020-06-10
  • 通讯作者: 曹建芳(1976—)
  • 作者简介:曹建芳(1976—),女,山西忻州人,教授,博士,CCF高级会员,主要研究方向:数字图像理解、大数据。赵爱迪(1993—),女,安徽颍上人,硕士研究生,主要研究方向:深度学习、图像处理。张自邦(1995—),男,河北临西人,硕士研究生,主要研究方向:深度学习、图像处理。
  • 基金资助:

    山西省自然科学基金资助项目(201701D121059);山西省高等学校人文社会科学重点研究基地项目(20190130);山西省艺术科学规划课题(2017F06);忻州市平台和人才专项(20180601)。

Application of convolutional neural network with threshold optimization in image annotation

CAO Jianfang1,2, ZHAO Aidi1, ZHANG Zibang1   

  1. 1. College of Computer Science and Technology, Taiyuan University of Science and Technology, Taiyuan Shanxi 030024, China
    2. Computer Department, Xinzhou Teachers University, Xinzhou Shanxi 034000, China
  • Received:2019-11-25 Revised:2020-01-05 Online:2020-06-18 Published:2020-06-10
  • Contact: CAO Jianfang, born in 1976, Ph. D., professor. Her research interests include digital image understanding, big data.
  • About author:CAO Jianfang, born in 1976, Ph. D., professor. Her research interests include digital image understanding, big data.ZHAO Aidi, born in 1993, M. S. candidate. Her research interests include deep learning, image processing.ZHANG Zibang, born in 1995, M. S. candidate. His research interests include deep learning, image processing.
  • Supported by:

    Natural Science Foundation of Shanxi Province (201701D121059), the Program of Humanities and Social Sciences Key Research Base of Higher Education Institutions of Shanxi (20190130), the Art and Science Planning Project of Shanxi Province (2017F06), the Special Projects for Platforms and Talents of Xinzhou (20180601).

摘要:

多标签图像标注在根据模型预测的概率,利用排名函数进行标注时会出现多标或少标的问题,提出了融合阈值寻优的卷积神经网络(CNN-THOP)模型,该模型由卷积神经网络(CNN)和阈值寻优构成。首先,通过CNN训练模型,利用该模型对图片进行预测,得到预测概率,其中在CNN中增加了批标准化层(BN)有效地加快了收敛。其次,利用该模型对测试集图片的预测概率进行阈值寻优,经过阈值寻优过程为每类标签得到一个最佳阈值,从而得到一组最佳阈值,只有当该类标签的预测概率大于等于该类标签的最佳阈值时,才会给图片标注该标签。在标注过程中,通过载入CNN模型和一组最佳阈值可以对所需标注的图像进行更加灵活的多标签标注。通过在自然场景图像数据集8 000张图片上的验证,结果表明,CNN-THOP较传统的基于排名的支持向量机法(Rank-SVM)在平均查准率上提升了约20个百分点,较基于均方误差函数的卷积神经网络(CNN-MSE)在平均召回率和F1值上分别提高了约6个百分点和4个百分点,且完全匹配度(CMD)达到了64.75%,验证了该方法在图像自动标注方面的有效性。

关键词: 图像自动标注, 多标签学习, 卷积神经网络, 阈值寻优, 批标准化

Abstract:

Ranking function based annotation may cause more or fewer labels according to the probability predicted by the model in multi-label image annotation. Therefore, a Convolutional Neural Network with THreshold OPtimization (CNN-THOP) model was proposed. The model consists of Convolutional Neural Network (CNN) and threshold optimization. Firstly, CNN was used to train a model, which was used to predict the image, so as to obtain the prediction probability, and Batch Normalization (BN) layer was added to the CNN to effectively accelerate the convergence. Secondly, threshold optimization was performed by the prediction probabilities of the test set images obtained by the proposed model. After the threshold optimization process, an optimal threshold was obtained for each kind of label, so as to obtain a set of optimal thresholds. Only when the prediction probability of this kind of label was greater than or equal to the best threshold of this kind of label, the image would be labeled with this label. In the labeling process, the CNN model and a set of optimal thresholds were added to achieve more flexible multi-label labeling of the image to be labeled. Through the verification on 8 000 images in the natural scene image dataset, experimental results show that CNN-THOP has about 20 percentage points improvement on average precision compared to Ranking Support Vector Machine (Rank-SVM), and is about 6 percentage points and 4 percentage points higher respectively than Convolutional Neural Network using Mean Square Error function (CNN-MSE) in average recall and F1 value respectively, and has the Complete Matching Degree (CMD) reached 64.75%, which proves that the proposed method is effective in automatic image annotation.

Key words: automatic image annotation, multi-label learning, Convolutional Neural Network (CNN), threshold optimization, Batch Normalization (BN)

中图分类号: