1. Modern Education Technology Center, Xinjiang University, Urumqi Xinjiang 830046, China 2. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046,China
Abstract:Text representation is the most important phase in automatic text categorization. In the Vector Space Model (VSM) based text representation, the selection of feature granularity has the direct impact on the text categorization performance. The statistical approach based Uyghur phrase extraction algorithm was proposed and the Uyghur text categorization experiments was conducted using Support Vector Machine (SVM) algorithm based on the extracted phrases as text features. The experimental results show that the phrase based Uyghur text categorization achieves higher classification precision and recall compared to the word based categorization.
阿力木江·艾沙 吐尔根·依布拉音 库尔班·吾布力 李哲. 基于短语的维吾尔文文本分类[J]. 计算机应用, 2012, 32(10): 2923-2926.
ALIMJAN Aysa TURGUN Ibrahim KURBAN Obul LI Zhe. Phrase based Uyghur language text categorization. Journal of Computer Applications, 2012, 32(10): 2923-2926.
CAROPRESO M F, MATWIN S, SEBASTIANI F. Statistical phrases in automated text categorization,Statistical Phrases in Automated Text Categorization [R]. Paris: Centre National de la Recherche Scientifique,2000:78-102.
[4]
KOSTER C, BENEY J. Phrase-based document categorization revisited[C]// PaIR09: Proceedings of the 2nd International Workshop on Patent Information Retrieval.New York: ACM, 2009:49-55.
VAPNIK V.The nature of statistical learning theory[M].New York:Springer-Verlag,1995.
[10]
JOACHIMS T. Text categorization with support vector machines: Learning with many relevant features[C]// European Conference on Machine Learning. Berlin: Springer-Verlag, 1998:137-142.