计算机应用 ›› 2011, Vol. 31 ›› Issue (11): 3038-3041.DOI: 10.3724/SP.J.1087.2011.03038

• 图形图像技术 • 上一篇    下一篇

基于Word Spotting技术的蒙古文古籍图像检索中的特征选择

魏宏喜,高光来   

  1. 内蒙古大学 计算机学院,呼和浩特 010021
  • 收稿日期:2011-05-26 修回日期:2011-07-04 发布日期:2011-11-16 出版日期:2011-11-01
  • 通讯作者: 魏宏喜
  • 作者简介:魏宏喜(1980-),男,辽宁鞍山人,讲师,博士研究生,CCF会员,主要研究方向:蒙古文文字识别、文档图像检索;高光来(1964-),男,内蒙古扎赉特人,教授,博士生导师,主要研究方向:蒙古文信息处理、数据挖掘。
  • 基金资助:
    国家自然科学基金资助项目;教育部“春晖”计划项目

Feature selection in word spotting technology for retrieving historical Mongolian document images

WEI Hong-xi,GAO Guang-lai   

  1. School of Computer Science, Inner Mongolia University, Hohhot Inner Mongolia 010021, China
  • Received:2011-05-26 Revised:2011-07-04 Online:2011-11-16 Published:2011-11-01
  • Contact: WEI Hong-xi

摘要: 设计了一个基于word spotting技术的蒙古文《甘珠尔经》图像检索的系统框架。在充分分析了蒙古文《甘珠尔经》中手写单词图像特点的基础上,提出了采用轮廓特征、投影特征和笔划穿越数目来表示单词图像。在由5500个单词图像构成的数据集上进行对比实验,确定了最佳的特征组合,平均准确率(MAP)能达到78.79%,R-Precision能达到73.01%。实验结果表明,所选择的特征是合理的、有效的。

关键词: 蒙古文古籍图像, 甘珠尔经, word spotting, 文档图像检索, 轮廓特征, 动态时间弯曲

Abstract: A systematic frame for retrieving Mongolian Kanjur images by word spotting technology was designed. Some features including profile feature, projection feature and back-to-ink transition were adopted to represent Mongolian word images by deeply analyzing characteristics of handwritten Mongolian word images. By doing a lot of comparison experiments on the dataset with 5500 word images, the best combination of features was determined. Meanwhile, the Mean Average Precision (MAP) of about 78.79% was achieved and the R-Precision 73.01%. The experimental results show that the selective features are valid and effective.

Key words: historical Mongolian document image, Kanjur, word spotting, Document Image Retrieval (DIR), profile feature, Dynamic Time Warping (DTW)