Journal of Computer Applications ›› 2005, Vol. 25 ›› Issue (01): 11-13.DOI: 10.3724/SP.J.1087.2005.00011
• Artificial intelligence • Previous Articles Next Articles
ZHOU Xin-dong, WANG Ting
Online:
Published:
周新栋,王挺
基金资助:
国家863计划资助项目(2001AA114110)
Abstract: Text classification has become a research focus in the field of natural language processing. After the review of traditional text classification models, a method using N-gram language models to classify Chinese text was presented. This model doesn′t present documents with bag of words, but regards documents as random observation sequences. With the bi-gram model, a text classifier based on word level was implemented. The performance of the N-gram model classifier was compared with that of the traditional models (Vector Space Model and Naive Bayes Model). Experiment result shows that the accuracy and the stability of the N-gram model classifier are better than others.
Key words: text classification, N-gram language model, parameter smoothing
摘要: 分类是近年来自然语言处理领域的一个研究热点。在分析了传统的分类模型后,文中提出了用N元语言模型作为中文文本分类模型。该模型不以传统的"词袋"(bagofwords)方法表示文档,而将文档视为词的随机观察序列。根据该方法,设计并实现一个基于词的2元语言模型分类器。通过N元语言模型与传统分类模型(向量空间模型和NaiveBayes模型)的实验对比,结果表明:N元模型分类器具有更好的分类性能。
关键词: 文本分类, N元语言模型, 参数平滑
CLC Number:
TP391.1
ZHOU Xin-dong, WANG Ting. Text classification based on N-gram language model[J]. Journal of Computer Applications, 2005, 25(01): 11-13.
周新栋,王挺. 基于N元语言模型的文本分类方法[J]. 计算机应用, 2005, 25(01): 11-13.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: http://www.joca.cn/EN/10.3724/SP.J.1087.2005.00011
http://www.joca.cn/EN/Y2005/V25/I01/11