计算机应用

• 典型应用 • 上一篇    下一篇

基于不完全朴素贝叶斯分类模型的垃圾邮件分类模型

惠孛 吴跃   

  1. 电子科技大学,计算科学与工程学院 电子科技大学
  • 收稿日期:2008-09-24 修回日期:1900-01-01 发布日期:2009-03-01 出版日期:2009-03-01
  • 通讯作者: 惠孛

Anti-spam model based on semi-Naive Bayesian classification model

<a href="http://www.joca.cn/EN/article/advancedSearchResult.do?searchSQL=(((Bei Hui[Author]) AND 1[Journal]) AND year[Order])" target="_blank">Bei Hui</a>   

  • Received:2008-09-24 Revised:1900-01-01 Online:2009-03-01 Published:2009-03-01
  • Contact: Bei Hui

摘要: 由于朴素贝叶斯分类模型的简单高效,在垃圾邮件分类时可以达到较好的效果;但朴素贝叶斯的条件独立假设割裂了属性之间的关系,影响了分类的准确性。放松朴素贝叶斯分类模型关于属性之间条件独立假设,介绍一种新的基于不完全朴素贝叶斯分类模型的垃圾邮件分类模型,N平均1依赖邮件过滤模型。使用N个1依赖分类模型的平均概率作为分类的预测概率。实验证明,该模型在简单、高效的同时降低了对垃圾邮件分类的错误率。

关键词: 贝叶斯分类, 不完全朴素贝叶斯, 垃圾邮件

Abstract: Because Naive Bayes (NB) classification model is simple and effective, good efficiency can be achieved in antispam applications. On the other hand, the assumption of its attribute independence makes it unable to express its semantic dependence. This paper proposed a new antispam classification model based on semi-NB classification model, averaged on N one-dependence classification model. It relaxed the assumption of condition independence of each attribute. It was assumed that all attributes were dependent on one attribute (1-dependence). The average on N 1-dependence was regarded as the probability of each class label. This method is simple and efficient and decreases the classification error ratio.

Key words: Bayesian classification, semi-Naive Bayes, spam