Journal of Computer Applications ›› 2017, Vol. 37 ›› Issue (5): 1263-1269.DOI: 10.11772/j.issn.1001-9081.2017.05.1263

Previous Articles     Next Articles

Real-time data analysis system based on Spark Streaming and its application

HAN Dezhi1, CHEN Xuguang1, LEI Yuxin2, DAI Yongtao1, ZHANG Xiao1   

  1. 1. College of Information Engineering, Shanghai Maritime University, Shanghai 201306, China;
    2. School of Information Engineering, Zhengzhou University, Zhengzhou Henan 450001, China
  • Received:2016-07-15 Revised:2016-11-26 Online:2017-05-10 Published:2017-05-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61672338,61373028).

基于Spark Streaming的实时数据分析系统及其应用

韩德志1, 陈旭光1, 雷雨馨2, 戴永涛1, 张肖1   

  1. 1. 上海海事大学 信息工程学院, 上海 201306;
    2. 郑州大学 信息工程学院, 郑州 450001
  • 通讯作者: 韩德志
  • 作者简介:韩德志(1966-),男,河南信阳人,教授,博士,CCF会员,主要研究方向:云计算、云存储及其安全、大数据应用;陈旭光(1993-),男,河南信阳人,硕士研究生,主要研究方向:云计算、大数据实时分析;雷雨馨(1996-),女,河南郑州人,主要研究方向:数据挖掘、网络安全;戴永涛(1991-),男,湖南邵阳人,硕士研究生,主要研究方向:云计算、分布式计算、数据挖掘、网络安全;张肖(1994-),女,安徽蚌埠人,硕士研究生,主要研究方向:云计算、大数据实时分析。
  • 基金资助:
    国家自然科学基金资助项目(61373028,61672338)。

Abstract: In order to realize the rapid analysis of massive real-time data, a Distributed Real-time Data Analysis System (DRDAS) was designed, which resolved the collection, storage and real-time analysis for mass concurrent data. And according to the operation principle of Spark Streaming, a dynamic sampling K-means parallel algorithm was proposed, which could quickly and efficiently detect all kinds of DDoS (Distributed Denial of Service) attacks. The experimental results show that the DRDAS has good scalability, fault tolerance and real-time processing ability, and along with new K-means parallel algorithm, the DRDAS can real-time detect various DDoS attacks, and shorten the detecting time of attacks.

Key words: Spark Streaming framework, distributed stream processing, network data analysis, Distributed Denial of Service (DDoS) attack

摘要: 为了实现对实时网络数据流的快速分析,设计一种分布式实时数据流分析系统(DRDAS),能有效解决并发访问数据流的收集、存储和实时分析问题,为大数据环境的网络安全检测提供了一种有效的数据分析平台;根据Spark Streaming运行的原理设计一种动态采样的K-Means并行算法,与DRDAS结合能实时有效地检测大数据环境下的各种分布式拒绝服务(DDoS)攻击。实验结果显示:DRDAS具有好的可扩展性、容错性和实时处理能力,与动态采样的K-Means并行算法结合能实时地检测各种DDoS攻击,缩短了攻击的检测时间。

关键词: Spark Streaming框架, 分布式流处理, 网络数据分析, 分布式拒绝服务攻击

CLC Number: