Study on auto-proofreading method for POS tagging of Chinese corpus

doi:10.3724/SP.J.2005.00017

Journal of Computer Applications ›› 2005, Vol. 25 ›› Issue (01): 17-19.DOI: 10.3724/SP.J.2005.00017

• Artificial intelligence • Previous Articles Next Articles

Study on auto-proofreading method for POS tagging of Chinese corpus

ZHANG Hu, ZHENG Jia-heng, LIU Jiang

College of Computer & Information Technology, Shanxi University

Online:2011-04-22 Published:2005-01-01

汉语语料库词性标注自动校对方法研究

张虎，郑家恒，刘江

山西大学计算机与信息技术学院

基金资助:
国家863计划资助项目(2001AA4031)

Abstract

Abstract: The auto-proofreading problem in the large-scale corpus was analyzed, and a new method inspecting the correctness of POS tagging and an auto-proofreading method based on clustering and classifying were put forward. Using clustering and classifying, the method firstly classified the sequences of part of speech of the example and got the threshold value. Then according to the threshold value, it classified the test sequences to judge its correctness, and gave out a proofreading POS to the wrong POS Tagging. Furthermore, it enhanced the correctness ratio of the part of speech tagging on large-scale corpus.

Key words: clustering, POS Tagging, auto-proofreading

摘要： 从聚类和分类的角度入手,对大规模语料库中的词性标注的自动校对问题作了分析,提出了语料库词性标注正确性检查和自动校对的新方法。该方法利用聚类和分类的思想,对范例进行聚类并求出阈值,根据阈值,判定词性标注的正误;对标注错误的词性,按靠近各词性类别重心的原则归类,给出一个校对词性,进而提高汉语语料库词性标注的准确率。

关键词: 聚类, 词性标注, 自动校对

CLC Number:

H085

ZHANG Hu, ZHENG Jia-heng, LIU Jiang. Study on auto-proofreading method for POS tagging of Chinese corpus[J]. Journal of Computer Applications, 2005, 25(01): 17-19.

张虎，郑家恒，刘江. 汉语语料库词性标注自动校对方法研究[J]. 计算机应用, 2005, 25(01): 17-19.

[1]	CHEN Hengheng, NI Zhiwei, ZHU Xuhui, JIN Yuanyuan, CHEN Qian. Differential privacy high-dimensional data publishing method via clustering analysis [J]. Journal of Computer Applications, 2021, 41(9): 2578-2585.
[2]	ZHU Cheng, ZHAO Xiaoqi, ZHAO Liping, JIAO Yuhong, ZHU Yafei, CHENG Jianying, ZHOU Wei, TAN Ying. Classification of functional magnetic resonance imaging data based on semi-supervised feature selection by spectral clustering [J]. Journal of Computer Applications, 2021, 41(8): 2288-2293.
[3]	ZENG Xiangyin, ZHENG Bochuan, LIU Dan. Detection of left and right railway tracks based on deep convolutional neural network and clustering [J]. Journal of Computer Applications, 2021, 41(8): 2324-2329.
[4]	DAI Yanran, DAI Guoqing, YUAN Yubo. Multi-face foreground extraction method based on skin color learning [J]. Journal of Computer Applications, 2021, 41(6): 1659-1666.
[5]	WANG Jiarui, TAN Guoping, ZHOU Siyuan. Clustered wireless federated learning algorithm in high-speed internet of vehicles scenes [J]. Journal of Computer Applications, 2021, 41(6): 1546-1550.
[6]	LI Guorong, YE Jimin, ZHEN Yuanting. Time series clustering based on new robust similarity measure [J]. Journal of Computer Applications, 2021, 41(5): 1343-1347.
[7]	MA Jianhong, CAO Wenbin, LIU Yuangang, XIA Shuang. Patent clustering method based on functional effect [J]. Journal of Computer Applications, 2021, 41(5): 1361-1366.
[8]	WANG Zhihe, CHANG Xiaoqing, DU Hui. Adaptive affinity propagation clustering algorithm based on universal gravitation [J]. Journal of Computer Applications, 2021, 41(5): 1337-1342.
[9]	LONG Chaoqi, JIANG Yu, XIE Yu. Improved wavelet clustering algorithm based on peak grid [J]. Journal of Computer Applications, 2021, 41(4): 1122-1127.
[10]	LI Xingfeng, HUANG Yuqing, REN Zhenwen, LI Yihong. Robust multi-view clustering algorithm based on adaptive neighborhood [J]. Journal of Computer Applications, 2021, 41(4): 1093-1099.
[11]	GUO Jia, HAN Litao, SUN Xianlong, ZHOU Lijuan. Comparative density peaks clustering algorithm with automatic determination of clustering center [J]. Journal of Computer Applications, 2021, 41(3): 738-744.
[12]	LYU Jia, XIAN Yan. Co-training algorithm combining improved density peak clustering and shared subspace [J]. Journal of Computer Applications, 2021, 41(3): 686-693.
[13]	ZOU Zhiwen, QIN Cheng. Method of dynamically constructing spatial topic R-tree based on k-means++ [J]. Journal of Computer Applications, 2021, 41(3): 733-737.
[14]	ZHANG En, LI Huimin, CHANG Jian. Verifiable k-means clustering scheme with privacy-preserving [J]. Journal of Computer Applications, 2021, 41(2): 413-421.
[15]	YUAN Qianqian, DENG Hongmin, WANG Xiaohang. Citrus disease and insect pest area segmentation based on superpixel fast fuzzy C-means clustering and support vector machine [J]. Journal of Computer Applications, 2021, 41(2): 563-570.

Study on auto-proofreading method for POS tagging of Chinese corpus

汉语语料库词性标注自动校对方法研究

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics