计算机应用 ›› 2010, Vol. 30 ›› Issue (2): 449-452.

• 模式识别 • 上一篇    下一篇

基于敏感点颜色聚类和行聚类筛选的文本提取

刘琼1,周慧灿1,王耀南2   

  1. 1. 湖南文理学院 计算机科学技术学院
    2. 湖南大学电气与信息工程学院
  • 收稿日期:2009-08-09 修回日期:2009-10-15 发布日期:2010-02-10 出版日期:2010-02-01
  • 通讯作者: 周慧灿
  • 基金资助:
    湖南省科技厅计划项目“自然场景下的文本定位与提取方法研究”

Text extraction based on clustering colors at sensible points and clustering text-lines for text-selection

  • Received:2009-08-09 Revised:2009-10-15 Online:2010-02-10 Published:2010-02-01
  • Contact: Hui-can ZHOU

摘要: 针对现有的文本提取算法不能适应复杂背景变化和文字本身的形状变化问题,提出一种基于敏感点颜色两级聚类和文本行聚类筛选的方法。新方法利用人眼视觉对颜色大幅度变化更敏感的特点,以敏感点的主要颜色作为聚类分析的依据,克服了现有阈值方法和聚类方法受背景颜色变化影响较大的问题。在此基础上,以文本行的空间排列特征为依据进进行文本行筛选,以克服一般方法容易受文字形状和尺寸变化影响的缺点。实验表明,新方法对于背景的复杂变化和文字的形状尺寸变化都具有很好的适应性。

关键词: 文本提取, K均值聚类, 边缘密度, 文本行聚类

Abstract: Since the existing text extraction methods can not adapt to the variation of complex background and shape, a new method was brought forward. It was founded on two-level color clustering of sensible points and text-line clustering. Because human vision perception is more sensitive to great change of colors, the new method only selected the main colors at sensible points to cluster. The strategy could solve the problems of the existing methods based on threshold and clustering which were greatly influenced by the variation in colors of complex background. And then, the text-lines were selected according to the fact that texts always align with each other in a same text-line. That course can eliminate the influence of variation in shape and size of characters. Experimental results indicate that, the new method has good adaptability to complex change of background, and texts with different size and shape.

Key words: text extraction, K-means clustering, edge density, text-line clustering