Journal of Computer Applications ›› 2011, Vol. 31 ›› Issue (05): 1335-1338.DOI: 10.3724/SP.J.1087.2011.01335

• Artificial intelligence • Previous Articles     Next Articles

Improved CYK algorithm based on shallow parsing

LI Yong-liang1,2, HUANG Shu-guang1, LI Yong-cheng1, BAO Lei1   

  1. 1. Department of Network, Electrical Engineering Institute, Hefei Anhui 230031, China
    2. The No. 61541 Army of PLA, Beijing 100094, China
  • Received:2010-10-20 Revised:2010-12-14 Online:2011-05-01 Published:2011-05-01



  1. 1.解放军电子工程学院 网络系, 合肥 230031
    2.解放军61541部队,北京 100094
  • 通讯作者: 李永亮
  • 作者简介:李永亮(1981-),男,河北石家庄人,硕士研究生,CCF会员,主要研究方向:计算机网络、信息安全、文本挖掘;黄曙光(1960-),男,安徽合肥人,教授,博士生导师,主要研究方向:操作系统、信息安全;李永成(1986-),男,山东潍坊人,博士研究生,主要研究方向:数据挖掘、信息安全;鲍蕾(1987-),女,安徽芜湖人,硕士研究生,主要研究方向:模式识别、信息安全。

Abstract: Different from English, modern Chinese syntax has obvious complexities: one is not easy to get the complete set of rules; the second, sentence of the analytical results contains a lot of ambiguous structures which are difficult to eliminate. Decomposition policy can divide syntax analysis tasks into different levels of small tasks, which rather than on the complete syntactic analysis is feasible. The basic idea is that first of all, multi-layer Markov model was used to parse a sentence which cut apart the complete sentence to some phrase about noun phrase, verb phase, etc. On the basis of the chunk, CYK algorithm was run to analyze the dependencies of the chunk, and ultimately realize the complete sentence syntactic analysis. Shallow parsing simplified rule set of CYK algorithm, and reduce the syntax parsing to some extent.

Key words: shallow parsing, Hidden Markov Model (HMM), parse tree, dependence relationship

摘要: 现代汉语句法与英语句法不同,具有明显复杂性,一是不容易获得完整的规则集,二是整句剖析所得结果含有大量的歧义结构难以消除。使用分治的策略将句法剖析任务分为不同层面的小任务,逐层进行句法剖析是一种可行有效的方法。其基本思想是:首先采用多层马尔可夫模型对句子进行短语组块剖析,将整个句子分割为名词组块、动词组块等短语语块,然后在此基础上运行CYK剖析算法,剖析组块间的依存关系,最终实现对完整语句的句法分析,浅层剖析简化了CYK算法规则集,在一定程度上降低了句法剖析难度。

关键词: 浅层剖析, 隐马尔可夫模型, 剖析树, 依存关系