Journal of Computer Applications ›› 2005, Vol. 25 ›› Issue (01): 17-19.DOI: 10.3724/SP.J.2005.00017
• Artificial intelligence • Previous Articles Next Articles
ZHANG Hu, ZHENG Jia-heng, LIU Jiang
Online:
Published:
张虎,郑家恒,刘江
基金资助:
国家863计划资助项目(2001AA4031)
Abstract: The auto-proofreading problem in the large-scale corpus was analyzed, and a new method inspecting the correctness of POS tagging and an auto-proofreading method based on clustering and classifying were put forward. Using clustering and classifying, the method firstly classified the sequences of part of speech of the example and got the threshold value. Then according to the threshold value, it classified the test sequences to judge its correctness, and gave out a proofreading POS to the wrong POS Tagging. Furthermore, it enhanced the correctness ratio of the part of speech tagging on large-scale corpus.
Key words: clustering, POS Tagging, auto-proofreading
摘要: 从聚类和分类的角度入手,对大规模语料库中的词性标注的自动校对问题作了分析,提出了语料库词性标注正确性检查和自动校对的新方法。该方法利用聚类和分类的思想,对范例进行聚类并求出阈值,根据阈值,判定词性标注的正误;对标注错误的词性,按靠近各词性类别重心的原则归类,给出一个校对词性,进而提高汉语语料库词性标注的准确率。
关键词: 聚类, 词性标注, 自动校对
CLC Number:
H085
ZHANG Hu, ZHENG Jia-heng, LIU Jiang. Study on auto-proofreading method for POS tagging of Chinese corpus[J]. Journal of Computer Applications, 2005, 25(01): 17-19.
张虎,郑家恒,刘江. 汉语语料库词性标注自动校对方法研究[J]. 计算机应用, 2005, 25(01): 17-19.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.3724/SP.J.2005.00017
http://www.joca.cn/EN/Y2005/V25/I01/17