计算机应用 ›› 2012, Vol. 32 ›› Issue (12): 3565-3568.DOI: 10.3724/SP.J.1087.2012.03565

• 典型应用 • 上一篇    

基于短信发送模式的垃圾号码过滤算法

竺吴辉,王美清   

  1. 福州大学 数学与计算机科学学院,福州 350108
  • 收稿日期:2012-06-13 修回日期:2012-07-24 发布日期:2012-12-29 出版日期:2012-12-01
  • 通讯作者: 竺吴辉
  • 作者简介:竺吴辉(1988-),男,浙江宁波人,硕士研究生,主要研究方向:图像处理;〓王美清(1967-),女,福建三明人,教授,博士生导师,博士,主要研究方向:图像处理、视频压缩算法、数值算法。
  • 基金资助:
    福建省自然科学基金资助项目

Spam phone number filtering method based on SMS submission pattern

ZHU Wu-hui,WANG Mei-qing   

  1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fijian 350108, China
  • Received:2012-06-13 Revised:2012-07-24 Online:2012-12-29 Published:2012-12-01
  • Contact: ZHU Wu-hui

摘要: 在一个垃圾短信泛滥的时代,清除垃圾短信将耗费大量的时间和精力,挖掘垃圾短信的发送特征是解决这一问题的关键。在分析现有的短信过滤机制(算法)的基础上,根据中值滤波的思想,将短信发送者离散的交互单元合并成一个连续的交互单元,进而提出有效交互周期的概念,以入出比、有效交互周期等特征建立垃圾短信的综合过滤算法。通过对2000万条真实短信记录进行实验,统计得到过滤算法针对垃圾短信的查全率达到99.51%,查准率为49.90%。实验结果表明,算法提高了垃圾短信检测的效率和速度,可适用于垃圾短信实时拦截技术。

关键词: 垃圾短信, 交互单元, 有效交互周期, 入出比, 查准率, 查全率

Abstract: In a time flooded with massive spam messages, clearing them waste a huge amount of effort and time. The mining sent feature of spam messages is the key to solving this problem. On the basis of analyzing current text-message filtering mechanisms, an effective interaction period is proposed by combining the discrete interaction units of the message sender into a consecutive interaction unit according to the essence of median filter. Utilizing the ratio of input to output and Effective Interaction Period (EIP), a general filtering algorism of spam message is built. Experimenting on 20 millions real messages, the recall ratio of the proposed algorithm is 99.51% and the precision ratio is 49.90%. The experimental results indicate that the novel algorism greatly enhances the efficiency and velocity of detection, which can be applied to spam messages real-time intercepted technology.

Key words: spam message, interaction unit, EIP, the ratio of input to output, precision ratio, recall ratio