Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (4): 1081-1088.DOI: 10.11772/j.issn.1001-9081.2018091926

Previous Articles     Next Articles

Malicious webpage integrated detection method based on Stacking ensemble algorithm

PIAOYANG Heran, REN Junling   

  1. School of Information Management, Beijing Information Science & Technology University, Beijing 100192, China
  • Received:2018-09-17 Revised:2018-10-31 Online:2019-04-10 Published:2019-04-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (71571021), the Scientific Research Project of Beijing Municipal Education Commission (KM201811232019).


朴杨鹤然, 任俊玲   

  1. 北京信息科技大学 信息管理学院, 北京 100192
  • 通讯作者: 任俊玲
  • 作者简介:朴杨鹤然(1995-),男,北京人,主要研究方向:信息安全、网络攻防;任俊玲(1979-),女,山西平遥人,副教授,博士,主要研究方向:网络安全、智能信息处理。
  • 基金资助:

Abstract: Aiming at the problems of excessive cost of resource, long detection period and low classification effect of mainstream malicious webpage detection technology, a Stacking-based malicious webpage integrated detection method was proposed, with heterogeneous classifiers integration method applying to malicious webpage detection and recognition. By extracting and analyzing the relevant factors of webpage features, and performing classification and ensemble learning, the detection model was obtained. In the detection model, the primary classifiers were constructed based on K-Nearest Neighbors (KNN) algorithm, logistic regression algorithm and decision tree algorithm respectively, and Support Vector Machine (SVM) classifier was used for the construction of secondary classifier. Compared with the traditional malicious webpage detection methods, the proposed method improves the recognition accuracy by 0.7% and obtains a high accuracy of 98.12% in the condition of low resource consumption and high velocity. The experimental results show that the detection model constructed by the proposed method can recognize malicious webpages efficiently and accurately.

Key words: malicious webpage, machine learning, classifier ensemble, Stacking

摘要: 针对目前主流恶意网页检测技术耗费资源多、检测周期长和分类效果低等问题,提出一种基于Stacking的恶意网页集成检测方法,将异质分类器集成的方法应用在恶意网页检测识别领域。通过对网页特征提取分析相关因素和分类集成学习来得到检测模型,其中初级分类器分别使用K近邻(KNN)算法、逻辑回归算法和决策树算法建立,而次级的元分类器由支持向量机(SVM)算法建立。与传统恶意网页检测手段相比,此方法在资源消耗少、速度快的情况下使识别准确率提高了0.7%,获得了98.12%的高准确率。实验结果表明,所提方法构造的检测模型可高效准确地对恶意网页进行识别。

关键词: 恶意网页, 机器学习, 分类器集成, Stacking

CLC Number: