Journal of Computer Applications

• Articles • Previous Articles     Next Articles

Classification text with incomplete data based on Bernoulli mixture mode

CAI Chong-Chao CAI WANG Shi-Tong WANG   

  • Received:2006-11-28 Revised:2007-01-21 Online:2007-05-01 Published:2007-05-01
  • Contact: CAI Chong-Chao CAI

一种基于Bernoulli混合模型的不完整数据文本分类方法

蔡崇超 王士同   

  1. 江苏省无锡市江南大学信息工程学院 05级硕士 江苏省无锡市江南大学信息工程学院
  • 通讯作者: 蔡崇超

Abstract: It is an important issue to construct the text classification with incomplete data. An improved method that based on Bernoulli Mixture Model and Expectation Maximization(EM) algorithm was introduced. Based on Bernoulli Mixture Model and EM algorithm, by learning the labeled data, the initial value of likelihood function parameter was obtained first. Then the parameter estimate of prior probability model on the classifier with EM algorithm including weight was presented. Finally we got the improved classifier. The results show that our new method is better than the nave bayes text classification in the recall and precision.

Key words: incomplete data, text classification, naive bayes classification, Bernoulli mixture model, EM algorithm

摘要: 在Bernoulli混合模型和期望最大化(EM)算法的基础上给出了一种基于不完整数据的改进方法。首先在已标记数据的基础上通过Bernoulli混合模型和朴素贝叶斯算法得到似然函数参数估计初始值, 然后利用含有权值的EM算法对分类器的先验概率模型进行参数估计,得到最终的分类器。实验结果表明,该方法在准确率和查全率方面要优于朴素贝叶斯文本分类。

关键词: 不完整数据集, 文本分类, 朴素贝叶斯分类, Bernoulli混合模型, EM算法