Journal of Computer Applications ›› 2010, Vol. 30 ›› Issue (06): 1676-1678.
• Software process technology & Chinese information processing • Previous Articles Next Articles
Received:
Revised:
Online:
Published:
王花1,古丽拉·阿东别克2,吴守用3
通讯作者:
基金资助:
Abstract: This paper introduced the basic theory of the Support Vector Machine (SVM) and k-Nearest Neighbor (kNN) algorithm and two different features selection methods in Kazak natural language. An empirical study of using the SVM, kNN, Bayes algorithm to categorize the Kazak text was conducted. The experimental results show that compared with kNN, Bayes, SVM has better categorization of the Kazak text. Due to the characteristics of Kazak's morpheme and configuration, the precision and recall will be lowered if the word is cut with affix.
Key words: Kazak text categorization, SVM, featrur selection, KNN
摘要: 介绍了支持向量机(SVM)和k-最近邻法(kNN)分类算法的思想和两种哈萨克语特征提取方法。对SVM、kNN和Bayes算法在哈萨克语文本分类的实验进行了比较。实验结果表明:在处理哈萨克语文本分类问题上,SVM较kNN和Bayes有较好的分类效果。由于哈萨克文单词的语素和构形的特点,若对哈萨克语词缀进行切分,则会降低文本分类的准确率和查全率。
关键词: 哈萨克语文本分类, SVM, 特征选择, KNN
王花 古丽拉·阿东别克 吴守用. 基于SVM的哈萨克语文本分类[J]. 计算机应用, 2010, 30(06): 1676-1678.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/
https://www.joca.cn/EN/Y2010/V30/I06/1676