Journal of Computer Applications ›› 2010, Vol. 30 ›› Issue (8): 2006-2009.
• Artificial intelligence • Previous Articles Next Articles
,
Received:
Revised:
Online:
Published:
邓维斌1,洪智勇2
通讯作者:
基金资助:
Abstract: How to combine the head information and body information of an Email for spam filtering has drawn many researchersattention. Owing to that the rough set is a useful tool to deal with uncertain information, a new doublestage spam filtering method was proposed. Firstly, the Emails were classified into nonspam set, spam set and doubt set according to the head information. Secondly, the doubt set was classified into nonspam set and spam set according to the body information. The simulation results on two Email data sets in English and Chinese respectively illustrate that not only the accuracy is improved but also the manslaughter rate of classifying nonspam emails into spam set is reduced significantly.
Key words: rough set, naive Bayes, feature selecting, spam filtering
摘要: 如何将邮件的头信息和内容信息有效结合起来进行垃圾邮件过滤备受研究人员的关注。基于粗糙集具有很好地处理不确定信息的特点,提出了一种基于粗糙集的两阶段邮件过滤方法,首先根据邮件头信息将其分为正常邮件、垃圾邮件和可疑邮件,再根据邮件内容将可疑邮件分为正常和垃圾邮件。通过在中英文邮件集上的测试实验,证明了所提出的邮件过滤方法不仅能提高垃圾邮件过滤的准确率,而且能大幅降低误杀率。
关键词: 粗糙集, 朴素贝叶斯, 特征选择, 垃圾邮件过滤
邓维斌 洪智勇. 基于粗糙集的两阶段邮件过滤方法[J]. 计算机应用, 2010, 30(8): 2006-2009.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/
https://www.joca.cn/EN/Y2010/V30/I8/2006