计算机应用 ›› 2005, Vol. 25 ›› Issue (01): 4-6.DOI: 10.3724/SP.J.1087.2005.00004

• 人工智能 • 上一篇    下一篇

自动提取词汇化树邻接文法

许云,樊孝忠,张锋   

  1. 北京理工大学计算机科学与工程系
  • 出版日期:2005-01-01 发布日期:2005-01-01
  • 基金资助:

    云南省信息技术项目基金资助项目(2002IT03)

Auto extracting for lexicalized tree adjoining grammar

XU Yun, FAN Xiao-zhong, ZHANG Feng   

  1. epartment of Computer Science & Engineering, Beijing Institution of Technology
  • Online:2005-01-01 Published:2005-01-01

摘要: 提出了一种从宾州中文语料库中自动提取词汇化树邻接文法(LTAG)的算法。该算法的主要思想是从词汇化树库中归纳出三种类型的词汇化树,然后利用了中心词驱动短语结构文法的方法从语料库自动提取结构合理的词汇化树;最后由语言规则对不合法的词汇化树进行过滤。与手工创建词汇化树邻接文法相比,它需要较少的人力,并且避免了人工创建词汇化树可能造成的遗漏或不一致现象。

关键词: 词汇化树邻接文法, 词汇化树, 语料库, 自然语言处理

Abstract: An algorithm of the extracting Lexicalized Tree Adjoining Grammar(LTAG) from Penn Chinese corpus was presented. Idea of the algorithm is to induce three kinds of trees from lexicalized tree bank. Then the method of Head-driven Phrase Structure Grammar(HPSG) was applied to extract lexicalized tree from corpus. In the end,invalid lexicalized trees were filtered out by linguistic rules. It requires fewer human efforts compared with hand-crafted grammar. It is possible to remedy omission of grammatical syntactic structures in hand-crafted grammar.

Key words:  lexicalized tree adjoining grammar, lexicalized tree, corpus, natural language processing

中图分类号: