计算机应用 ›› 2005, Vol. 25 ›› Issue (06): 1442-1444.DOI: 10.3724/SP.J.1087.2005.01442

• 典型应用 • 上一篇    下一篇

基于有向图的双向匹配分词算法及实现

陈耀东,王挺   

  1. 国防科学技术大学计算机学院
  • 出版日期:2005-06-01 发布日期:2011-04-06
  • 基金资助:

    国家863计划项目;国家自然科学基金资助项目(60403050)

Using directed graph based BDMM algorithm for Chinese word segmentation

CHEN Yao-dong, WANG Ting   

  1. School of Computer Science, National University of Defense Technology, Changsha Hunan 410073, China
  • Online:2005-06-01 Published:2011-04-06

摘要: 在分析了现有各种汉语分词算法及其优缺点的基础上,提出以句子覆盖率和分词覆盖率作为评价分词方法的指标,详细介绍了基于网络有向图的双向匹配分词算法的设计与实现,该算法对经典的最大匹配分词算法进行了改进,通过带覆盖歧义标志的有向图生成多候选分词序列。与最大匹配算法和全切分算法的比较实验显示,基于有向图的双向匹配算法以低复杂度实现了高覆率盖。

关键词: 句子覆盖率, 分词覆盖率, 双向最大匹配算法, 全切分, 网络有向图

Abstract: Chinese word segmentation is one of the fundamental key techniques for Chinese Information Processing. In this paper, the authors firstly studied current segmentation algorithms, then, modifid the traditional Maximum Match (MM) algorithm. With the consideration of both word-coverage rate and sentence-coverage rate, a character Directed Graph with ambiguity mark was implemented for searching all possible segmentation sequences. This method compared with the classic MM algorithms and omni-segmentation algorithm and the experiment result shows that the Directed Graph based algorithm can achieve higher coverage rate and lower complexity.

Key words: sentence-coverage rate, word-coverage rate, Bi-directional Maximum Match, omni-segmentation, directed graph

中图分类号: