计算机应用 ›› 2013, Vol. 33 ›› Issue (07): 1861-1865.DOI: 10.11772/j.issn.1001-9081.2013.07.1861

• 信息安全 • 上一篇    下一篇

基于用户反馈的混合型垃圾邮件过滤方法

黄国伟1,许昱玮2   

  1. 1. 深圳信息职业技术学院 计算机学院,广东 深圳 518172
    2. 南开大学 信息技术科学学院,天津 300071
  • 收稿日期:2013-01-22 修回日期:2013-02-21 出版日期:2013-07-01 发布日期:2013-07-06
  • 通讯作者: 黄国伟
  • 作者简介:黄国伟(1981-),男,广西桂林人,讲师,博士,主要研究方向:数据挖掘、分布式计算;许昱玮(1985-),男,安徽黄山人,讲师,博士,主要研究方向:数据挖掘、信息安全。
  • 基金资助:

    广东省自然科学基金资助项目(S2011040006119);广东省科技计划项目(2011B040300034)

Hybrid spam filtering method based on users' feedback

HUANG Guowei1,XU Yuwei2   

  1. 1. Computer College, Shenzhen Institute of Information Technology, Shenzhen Guangdong 518172, China
    2. College of Information Technical Science, Nankai University, Tianjin 300071, China
  • Received:2013-01-22 Revised:2013-02-21 Online:2013-07-06 Published:2013-07-01
  • Contact: HUANG Guowei

摘要: 针对目前垃圾邮件过滤技术仅依赖单一邮件特征实施邮件分类、对邮件特征变化的适应性较差等局限,提出一种基于用户反馈的混合型垃圾邮件过滤方法。以用户社会网络关系为基础,借助用户反馈机制分别实现对基于内容与基于身份标识的邮件分类知识的动态更新;在此基础上采用贝叶斯模型,实现邮件的内容特征与发件人身份标识特征在邮件分类中的有机结合。实验结果表明,与传统的过滤方法比较,所提方法在邮件特征动态变化的环境下能够获得更好的邮件分类效果,邮件分类的总体召回率、查准率、精确率均能达到90%以上。所提方法能够在保证邮件分类性能的同时,有效提高邮件分类对邮件特征变化的适应性,是已有垃圾邮件过滤技术的重要补充。

关键词: 垃圾邮件, 基于内容的邮件过滤, 基于身份标识的邮件过滤, 邮件分类, 用户反馈, 贝叶斯模型

Abstract: Several limitations exist in the current spam filtering methods, such as they usually rely on only one type of E-mail characteristic to realize the E-mail classification, and have poor adaptability to the dynamic changes of E-mail characteristics. Concerning these limitations, a hybrid spam filtering method based on users' feedback was proposed. Based on the Social Network (SN) relationship among users, the dynamic update of the knowledge for E-mail classification was achieved with the help of the user's feedback scheme. Furthermore, the Bayesian model was introduced to integrate the content-based and the identity-based characteristics of E-mail in the classification. The simulation results show that the proposed method outperforms the traditional method in terms of E-mail classification, when the E-mail characteristics change dynamically. The overall recall, precision and accuracy ratios of the method can achieve 90% and above. While guaranteeing the performance of E-mail classification, the proposed method can improve the adaptability of classification to the changes of E-mail characteristics effectively. Therefore, the proposed method can act as a useful complement to the current spam filtering methods.

Key words: spam, content-based spam filtering, identity-based spam filtering, E-mail classification, users' feedback, Bayesian model

中图分类号: