Text classification based on N-gram language model

doi:10.3724/SP.J.1087.2005.00011

Journal of Computer Applications ›› 2005, Vol. 25 ›› Issue (01): 11-13.DOI: 10.3724/SP.J.1087.2005.00011

• Artificial intelligence • Previous Articles Next Articles

Text classification based on N-gram language model

ZHOU Xin-dong, WANG Ting

School of Computer Science, National University of Defense Technology

Online:2005-01-01 Published:2011-04-22

基于N元语言模型的文本分类方法

周新栋，王挺

国防科技大学计算机学院

基金资助:
国家863计划资助项目(2001AA114110)

Abstract

Abstract: Text classification has become a research focus in the field of natural language processing. After the review of traditional text classification models, a method using N-gram language models to classify Chinese text was presented. This model doesn′t present documents with bag of words, but regards documents as random observation sequences. With the bi-gram model, a text classifier based on word level was implemented. The performance of the N-gram model classifier was compared with that of the traditional models (Vector Space Model and Naive Bayes Model). Experiment result shows that the accuracy and the stability of the N-gram model classifier are better than others.

Key words: text classification, N-gram language model, parameter smoothing

摘要： 分类是近年来自然语言处理领域的一个研究热点。在分析了传统的分类模型后,文中提出了用N元语言模型作为中文文本分类模型。该模型不以传统的"词袋"(bagofwords)方法表示文档,而将文档视为词的随机观察序列。根据该方法,设计并实现一个基于词的2元语言模型分类器。通过N元语言模型与传统分类模型(向量空间模型和NaiveBayes模型)的实验对比,结果表明:N元模型分类器具有更好的分类性能。

关键词: 文本分类, N元语言模型, 参数平滑

CLC Number:

TP391.1

ZHOU Xin-dong, WANG Ting. Text classification based on N-gram language model[J]. Journal of Computer Applications, 2005, 25(01): 11-13.

周新栋，王挺. 基于N元语言模型的文本分类方法[J]. 计算机应用, 2005, 25(01): 11-13.

[1]	QIU Yunfei LIN Mingming SHAO Liangshan. Establishment and application of consumption sentiment ontology library based on three-dimensional coordinate [J]. Journal of Computer Applications, 2013, 33(09): 2540-2545.
[2]	ZHOU Huijuan XIANG Rong. MicroWindows-based multi-device support intelligent Chinese input system [J]. Journal of Computer Applications, 2013, 33(07): 2067-2070.
[3]	YANG Ligong ZHU Jian TANG Shiping. Survey of text sentiment analysis [J]. Journal of Computer Applications, 2013, 33(06): 1574-1607.
[4]	WANG Jing HE Tingting Yimamu'aishan ABUDOULIKEMU. Application of cooperative filtering in categories recommendation of Chinese Wikipedia [J]. Journal of Computer Applications, 2013, 33(03): 838-840.
[5]	XIU Chi SONG Rou. Disambiguation of domain word segmentation based on unsupervised learning [J]. Journal of Computer Applications, 2013, 33(03): 780-783.
[6]	HE Chunlin XIE Qi. Personalized Web services selection method based on collaborative filtering [J]. Journal of Computer Applications, 2013, 33(01): 239-242.
[7]	CHANG Xiao-long ZHANG Hui. Construction of Chinese polarity lexicon by integration of morpheme features [J]. Journal of Computer Applications, 2012, 32(07): 2033-2037.
[8]	WANG Xi-jie. Analysis on Effect Range of Context in Chinese Word Segmentation based Word -position Tagging [J]. Journal of Computer Applications, 2012, 32(05): 1340-1342.
[9]	LI Ming-tao LUO Jun-yong YIN Mei-juan LU Lin. Weight computing method for text feature terms by integrating word sense [J]. Journal of Computer Applications, 2012, 32(05): 1355-1358.
[10]	XU An-long. Improved computation method for semantic similarity between gene ontology terms [J]. Journal of Computer Applications, 2012, 32(05): 1329-1331.
[11]	WANG Jian-jiang QIU Di-shan PENG Li. Performance evaluation of space information data processing system based on queuing network [J]. Journal of Computer Applications, 2012, 32(03): 870-873.
[12]	Feng-ying HE. Orientation analysis for Chinese blog text based on semantic comprehension [J]. Journal of Computer Applications, 2011, 31(08): 2130-2133.
[13]	. Chinese-Uyghur machine translation system for phrase-based statistical translation [J]. Journal of Computer Applications, 2009, 29(07): 2022-2025.
[14]	LI Zhi-chao,HE Pi-lian,LEI Ming. New semantic circle caching model based on pre-define regions in mobile computing [J]. Journal of Computer Applications, 2005, 25(12): 2865-2867.
[15]	CHEN Rui-fen. Chinese text categorization algorithm combined with feedback [J]. Journal of Computer Applications, 2005, 25(12): 2862-2864.

Text classification based on N-gram language model

基于N元语言模型的文本分类方法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics