计算机应用 ›› 2010, Vol. 30 ›› Issue (10): 2610-2613.

• 数据库与数据挖掘 • 上一篇    下一篇

基于外部数据库的图像自动标注改善模型

李东艳1,李绍滋2,柯逍3   

  1. 1. 福建省厦门市厦门大学信息科学与技术学院智能科学与技术系
    2. 厦门大学信息科学与技术学院
    3. 厦门大学智能多媒体实验室
  • 收稿日期:2010-04-19 修回日期:2010-06-17 发布日期:2010-09-21 出版日期:2010-10-01
  • 通讯作者: 李绍滋
  • 基金资助:
    国家自然科学基金资助项目;高等学校博士学科点专项科研基金资助项目;深圳市科技计划基础研究资助项目

Improved image automatic annotation model based on external databases

  • Received:2010-04-19 Revised:2010-06-17 Online:2010-09-21 Published:2010-10-01

摘要: 针对图像标注中所使用数据集存在的数据不平衡问题,提出一种新的基于外部数据库的自动平衡模型。该模型先依据原始数据库中词频分布来找出低频点,再根据自动平衡模式,对每个低频词,从外部数据库中增加相应的图片;然后对图片进行特征提取,对Corel 5k数据集中的47065个视觉词汇和从外部数据库中追加的图片中提取出来的996个视觉词汇进行聚类;最后利用基于外部数据库的图像自动标注改善模型对图像进行标注。此方法克服了图像标注中数据库存在的不平衡问题,使得至少被正确标注一次的词的数量、精确率和召回率等均明显提高。

关键词: 外部数据库, 自动平衡模式, 数据库不平衡, 图像自动标注

Abstract: Concerning the imbalance of the data set used in image annotation, a new self-balancing model based on external database was proposed. Firstly, the low-frequency points were found based on word frequency distribution of the original database, and an appropriate amount of image was added from an external database under the self-balancing mode for each low-frequency word. Secondly, the image features were extracted, and 47065 visual vocabulary of the original data set and 996 visual words extracted from additional images of external databases were clustered together. Lastly, each image was annotated by the improved image automatic annotation model based on external database. The proposed method overcomes the imbalance in image annotation, making the number of words which can be correctly labeled at least once, precision and recall be increased obviously.

Key words: external database, self-balancing mode, database imbalance, image automatic annotation

中图分类号: