计算机应用 ›› 2005, Vol. 25 ›› Issue (01): 17-19.DOI: 10.3724/SP.J.2005.00017

• 人工智能 • 上一篇    下一篇

汉语语料库词性标注自动校对方法研究

张虎,郑家恒,刘江   

  1. 山西大学计算机与信息技术学院
  • 发布日期:2011-04-22 出版日期:2005-01-01
  • 基金资助:

    国家863计划资助项目(2001AA4031)

Study on auto-proofreading method for POS tagging of Chinese corpus

ZHANG Hu, ZHENG Jia-heng, LIU Jiang   

  1. College of Computer & Information Technology, Shanxi University
  • Online:2011-04-22 Published:2005-01-01

摘要: 从聚类和分类的角度入手,对大规模语料库中的词性标注的自动校对问题作了分析,提出了语料库词性标注正确性检查和自动校对的新方法。该方法利用聚类和分类的思想,对范例进行聚类并求出阈值,根据阈值,判定词性标注的正误;对标注错误的词性,按靠近各词性类别重心的原则归类,给出一个校对词性,进而提高汉语语料库词性标注的准确率。

关键词: 聚类, 词性标注, 自动校对

Abstract: The auto-proofreading problem in the large-scale corpus was analyzed, and a new method inspecting the correctness of POS tagging and an auto-proofreading method based on clustering and classifying were put forward. Using clustering and classifying, the method firstly classified the sequences of part of speech of the example and got the threshold value. Then according to the threshold value, it classified the test sequences to judge its correctness, and gave out a proofreading POS to the wrong POS Tagging. Furthermore, it enhanced the correctness ratio of the part of speech tagging on large-scale corpus.

Key words: clustering, POS Tagging, auto-proofreading

中图分类号: