计算机应用 ›› 2005, Vol. 25 ›› Issue (09): 2028-2030.DOI: 10.3724/SP.J.1087.2005.02028

• 人工智能 • 上一篇    下一篇

基于CBR的文本自动分类研究

张婷慧,耿焕同,蔡庆生   

  1. 中国科学技术大学计算机科学与技术系
  • 出版日期:2005-09-01 发布日期:2011-04-11
  • 基金资助:

    国家自然科学基金资助项目(70171052);;皖泰开发项目资助(143-150401)

Study of automatic text categorization based on CBR

ZHANG Ting-hui,GENG Huan-tong,CAI Qing-sheng   

  1. Department of Computer Science and Technology,University of Science and Technology of China,Anhui Hefei 230027,China
  • Online:2005-09-01 Published:2011-04-11

摘要: KNN方法是性能最好的文本分类方法之一,但它在分类时要计算待分类文档与所有训练样本的相似度,时间复杂度较大。文中提出了一种基于CBR的文本自动分类方法,先用聚类方法把训练样本库转换为范例库,然后用KNN思想分类。实验结果显示该方法分类的平均召回率和准确率达到了87.07%和89.17%;并且通过分析算法的时间复杂度得知,该方法的分类速度比KNN方法有很大的提高,因此具有很好的实用价值。

关键词: 基于范例推理, 文本自动分类, K近邻, 聚类

Abstract: K-Nearest Neighbor(KNN) is one of the top-performing classifiers,but it has a large time complexity on calculating the similarity between the document and all training samples.An automatic text categorization mechanism based on CBR was presented,the training sample library was converted to the case library and the document was classified by KNN.In experiments,the average recall and precision were 87.07% and 89.17% respectively.In addition,by analyzing the time complexity,this mechanism can perform much more quickly than the KNN method.

Key words: case-based reasoning(CBR), automatic text categorization, K-nearest neighbor, clustering

中图分类号: