计算机应用 ›› 2010, Vol. 30 ›› Issue (8): 2006-2009.

• 人工智能 • 上一篇    下一篇

基于粗糙集的两阶段邮件过滤方法

邓维斌1,洪智勇2   

  1. 1. 重庆邮电大学 经济管理学院
    2. 广东,江门五邑大学
  • 收稿日期:2010-02-01 修回日期:2010-02-28 发布日期:2010-07-30 出版日期:2010-08-01
  • 通讯作者: 邓维斌
  • 基金资助:
    自主式知识获取理论与方法研究(重庆市自然科学基金重点项目);自主式贝叶斯学习算法及在CRM中的应用研究(重庆邮电大学自然科学基金)

Double-stage spam filtering method based on rough set

,   

  • Received:2010-02-01 Revised:2010-02-28 Online:2010-07-30 Published:2010-08-01

摘要: 如何将邮件的头信息和内容信息有效结合起来进行垃圾邮件过滤备受研究人员的关注。基于粗糙集具有很好地处理不确定信息的特点,提出了一种基于粗糙集的两阶段邮件过滤方法,首先根据邮件头信息将其分为正常邮件、垃圾邮件和可疑邮件,再根据邮件内容将可疑邮件分为正常和垃圾邮件。通过在中英文邮件集上的测试实验,证明了所提出的邮件过滤方法不仅能提高垃圾邮件过滤的准确率,而且能大幅降低误杀率。

关键词: 粗糙集, 朴素贝叶斯, 特征选择, 垃圾邮件过滤

Abstract: How to combine the head information and body information of an Email for spam filtering has drawn many researchersattention. Owing to that the rough set is a useful tool to deal with uncertain information, a new doublestage spam filtering method was proposed. Firstly, the Emails were classified into nonspam set, spam set and doubt set according to the head information. Secondly, the doubt set was classified into nonspam set and spam set according to the body information. The simulation results on two Email data sets in English and Chinese respectively illustrate that not only the accuracy is improved but also the manslaughter rate of classifying nonspam emails into spam set is reduced significantly.

Key words: rough set, naive Bayes, feature selecting, spam filtering