计算机应用 ›› 2014, Vol. 34 ›› Issue (11): 3078-3081.DOI: 10.11772/j.issn.1001-9081.2014.11.3078

• 2014年全国开放式分布与并行计算学术年会(DPCS 2014)论文 • 上一篇    下一篇

基于Storm的海量数据实时聚类

王铭坤,袁少光,朱永利,王德文   

  1. 华北电力大学(保定) 控制与计算机工程学院,河北 保定 071003
  • 收稿日期:2014-07-28 修回日期:2014-08-04 出版日期:2014-11-01 发布日期:2014-12-01
  • 通讯作者: 王铭坤
  • 作者简介:王铭坤(1990-),男,山东泰安人,硕士研究生,主要研究方向:云计算、大数据处理;袁少光(1989-),男,河南洛阳人,硕士研究生,主要研究方向:分布式计算、并行计算;朱永利(1963-),男,河北冀州人,教授,博士生导师,CCF高级会员,主要研究方向:网络化监控、智能信息处理;王德文(1973-),男,黑龙江克山人,副教授,博士,主要研究方向:智能信息处理、计算机网络。
  • 基金资助:

    基于IEC61850与云计算的智能电网状态监测集成平台关键问题的研究

Real-time clustering for massive data using Storm

WANG Mingkun,YUAN Shaoguang,ZHU Yongli,WANG Dewen   

  1. School of Control and Computer Engineering, North China Electric Power University, Baoding Hebei 071003, China
  • Received:2014-07-28 Revised:2014-08-04 Online:2014-11-01 Published:2014-12-01
  • Contact: WANG Mingkun

摘要:

针对现有平台处理海量数据实时响应能力普遍较差的问题,引入Storm分布式实时计算平台进行大规模数据的聚类分析,设计了基于Storm框架的DBSCAN算法。该算法将整个过程分为数据接入、聚类分析、结果输出等阶段,在框架预定义的组件中分别编程实现,各组件通过数据流连通形成任务实体,提交到集群运行完成。通过对比分析和性能监测,验证了所提方案具有低延迟和高吞吐量的优势,集群运行状况良好,负载均衡。实验结果表明Storm平台处理海量数据实时性较高,能够胜任大数据背景下的数据挖掘任务。

Abstract:

In order to improve the real-time response ability of massive data processing, Storm distributed real-time platform was introduced to process data mining, and the Density-Based Spatial Clustering of Application with Noise (DBSCAN) clustering algorithm based on Storm was designed to deal with massive data. The algorithm was divided into three main steps: data collection, clustering analysis and result output. All procedures were realized under the pre-defined component of Storm and submitted to the Storm cluster for execution. Through comparative analysis and performance monitoring, the system shows the advantages of low latency and high throughput capacity. It proves that Storm suits for real-time processing of massive data.

中图分类号: