深度置信网络在垃圾邮件过滤中的应用

doi:10.11772/j.issn.1001-9081.2014.04.1122

计算机应用 ›› 2014, Vol. 34 ›› Issue (4): 1122-1125.DOI: 10.11772/j.issn.1001-9081.2014.04.1122

深度置信网络在垃圾邮件过滤中的应用

孙劲光¹,蒋金叶²,孟祥福²,李秀娟³

1. 辽宁工程技术大学电子与信息工程学院,辽宁葫芦岛 125105;
2. 辽宁工程技术大学研究生学院,辽宁葫芦岛 125105
3. 辽宁工程技术大学电子与信息工程学院，辽宁葫芦岛 125105

收稿日期:2013-10-12 修回日期:2013-12-13 出版日期:2014-04-01 发布日期:2014-04-29
通讯作者: 蒋金叶
作者简介:孙劲光(1962-),女,辽宁阜新人,教授,博士生导师,主要研究方向:数据挖掘、图形和图像处理、人脸识别;
蒋金叶(1989-),女,辽宁鞍山人,硕士研究生,主要研究方向:数据挖掘;
孟祥福(1981-),男,辽宁朝阳人,副教授,博士,主要研究方向:Web数据库柔性查询与优化、信息安全;
李秀娟(1988-),女,辽宁阜新人,硕士研究生,主要研究方向:无线通信、数据处理。
基金资助:
国家青年科学基金;基金国家科技支撑计划（矿业组合服务解决方案研究及服务模式设计）

Application of deep belief nets in spam filtering

SUN Jingguang¹,JIANG Jinye²,MENG Xiangfu¹,LI Xiujuan²

1. School of Electronic and Information Engineering, Liaoning Technical University, Huludao Liaoning 125105, China
2. Institute of Graduate, Liaoning Technical University, Huludao Liaoning 125105, China

Received:2013-10-12 Revised:2013-12-13 Online:2014-04-01 Published:2014-04-29
Contact: JIANG Jinye

摘要/Abstract

摘要：

针对深层神经网络初始化方法不明确、泛化能力差而导致解决垃圾邮件过滤时效果较差的问题，提出了基于深度置信网络的分类方法。深度置信网络通过逐层无监督的方法来预训练网络，实现了网络的初始化。在LingSpam，SpamAssassin和Enron1三个广泛使用的数据集上，通过与目前最好的垃圾邮件过滤方法支持向量机(SVM)在分类性能上进行比较，实验结果表明深度置信网络的垃圾邮件过滤方法是有效的，获得了较高的准确率和召回率。

Abstract:

Concerning the problem that how to initialize the weights of deep neural networks, which resulted in poor solutions with low generalization for spam filtering, a classification method of Deep Belief Net (DBN) was proposed based on the fact that the existing spam classifications are shallow learning methods. The DBN was pre-trained with the greedy layer-wise unsupervised algorithm, which achieved the initialization of the network. The experiments were conducted on three datesets named LinsSpam, SpamAssassin and Enron1. It is shown that compared with Support Vector Machines (SVM) which is the state-of-the-art method for spam filtering in terms of classification performance, the spam filtering using DBN is feasible, and can get better accuracy and recall.

中图分类号:

TP393.098

孙劲光蒋金叶孟祥福李秀娟. 深度置信网络在垃圾邮件过滤中的应用[J]. 计算机应用, 2014, 34(4): 1122-1125.

SUN Jingguang JIANG Jinye MENG Xiangfu LI Xiujuan. Application of deep belief nets in spam filtering[J]. Journal of Computer Applications, 2014, 34(4): 1122-1125.

参考文献

［1］PU C, WEBB S. Observed trends in spam construction techniques: a case study of spam evolution ［C］// CEAS 2006: Proceedings of the Third Conference on Email and Anti-Spam. Mountain View, California: CEAS, 2006: 104-112.
［2］PUNIKIS D, LAURUTIS R, DIRMEIKIS R. An artificial neural nets for spam E-mail recognition ［J］. Electronics and Electrical Engineering, 2006, 69(5): 73-76.
［3］ANDROUTSOPOULOS I, PALIOURAS G, MICHELAKIS E. Learning to filter unsolicited commercial E-mail ［M］. Athens, Greece: "DEMOKRITOS", National Center for Scientific Research, 2004.
［4］METSIS V, ANDROUTSOPIULOS I, PALIOURAS G. Spam filtering with naive Bayes — which naive Bayes? ［C］// CEAS 2006: Proceedings of the Third Conference on Email and Anti-Spam. Mountain View, California: CEAS, 2006: 27-28.
［5］ZHANG L, ZHU J, YAO T. An evaluation of statistical spam filtering techniques ［J］. ACM Transactions on Asian Language Information Processing, 2004, 3(4): 243-269.
［6］HOVOLD J. Naive Bayes spam filtering using word-position-based attributes ［C］// CEAS 2005: Proceedings of the Second Conference on Email and Anti-Spam. Palo Alto, CA: CEAS, 2005： 41-48.
［7］FUMERA G, PILLAI I, ROLI F. Spam filtering based on the analysis of text information embedded into images ［J］. The Journal of Machine Learning Research, 2006, 7: 2699-2720.
［8］ALMEIDA T A, ALMEIDA J, YAMAKAMI A. Spam filtering: how the dimensionality reduction affects the accuracy of Naive Bayes classifiers ［J］. Journal of Internet Services and Applications, 2011, 1(3): 183-200.
［9］ALMEIDA T A, YAMAKAMI A, ALMEIDA J. Evaluation of approaches for dimensionality reduction applied with Naive Bayes anti-spam filters ［C］// ICMLA'09: Proceedings of the 2009 International Conference on Machine Learning and Applications. Piscataway: IEEE, 2009: 517-522.
［10］SUN Z, XUE L, XU M, et al.Overview of deep learning［J］. Application Research of Computers, 2012, 29(8): 2806-2810. (孙志军,薛磊,许明阳,等.深度学习研究综述［J］. 计算机应用研究, 2012, 29(8): 2806-2810.
［11］HINTON G E, OSINDERO S, TEH Y-W. A fast learning algorithm for deep belief nets ［J］. Neural Computation, 2006, 18(7): 1527-1554.
［12］BENGIO Y, LAMBLIN P, POPOVICI D, et al.Greedy layer-wise training of deep networks ［C］// NIPS'06: Proceedings of the 2007 Twentieth Annual Conference on Neural Information Processing Systems. Cambridge: MIT Press, 2007, 19: 153-160.
［13］BEUGES C J C. A tutorial on support vector machines for pattern recognition ［J］. Data Mining and Knowledge Discovery, 1998, 2(2): 121-167.
［14］NOULAS A K, KRSE B J A. Deep belief networks for dimensionality reduction［EB/OL］. [2013-05-05]. http://dare.uva.nl/document/130040.
［15］ANGUITA D, GHIO A, RIDELLA S, et al.K-fold cross validation for error rate estimate in support vector machines ［C］// DMIN 2009: Proceedings of the 2009 International Conference on Data Mining. Las Vegas: CSREA Press, 2009: 291-297.

[1]	陈木生, 卢晓勇. 三种用于垃圾网页检测的随机欠采样集成分类器[J]. 计算机应用, 2017, 37(2): 535-539.
[2]	卢晓勇, 陈木生, 吴政隆, 张百栈. 基于免疫克隆特征选择和欠采样集成的垃圾网页检测[J]. 计算机应用, 2016, 36(7): 1899-1903.
[3]	卢晓勇, 陈木生. 基于随机森林和欠采样集成的垃圾网页检测[J]. 计算机应用, 2016, 36(3): 731-734.
[4]	戴瑾刘波卞皓宇. 基于云计算的电子邮件安全服务系统的设计与实现[J]. 计算机应用, 2013, 33(12): 3350-3353.
[5]	黄国伟许昱玮. 基于用户反馈的混合型垃圾邮件过滤方法[J]. 计算机应用, 2013, 33(07): 1861-1865.
[6]	陶永才薛正元石磊. 基于MapReduce的贝叶斯垃圾邮件过滤机制[J]. 计算机应用, 2011, 31(09): 2412-2416.
[7]	周念念;冉蜀阳;曾剑宇; 钟响. 基于人工免疫的反垃圾邮件系统模型[J]. 计算机应用, 2005, 25(11): 2562-2565.
[8]	王彩芬，于成尊，刘军龙，贾爱库. 一种新的认证邮件协议[J]. 计算机应用, 2005, 25(07): 1545-1547.
[9]	吕新杰，柴乔林，马莉. 一种新型P2P邮件系统的研究与实现[J]. 计算机应用, 2005, 25(03): 706-709.
[10]	胡可;张家树. 基于人工免疫系统的反垃圾邮件过滤机制[J]. 计算机应用, 2005, 25(11): 2559-2561.
[11]	李艳涛, 冯伟森. 堆叠去噪自编码器在垃圾邮件过滤中的应用[J]. 计算机应用, 2015, 35(11): 3256-3260.

深度置信网络在垃圾邮件过滤中的应用

Application of deep belief nets in spam filtering

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 11

编辑推荐

Metrics