Spark Streaming动态资源分配策略研究

• •

Spark Streaming动态资源分配策略研究

刘备,谭新明,曹文彬

湖北省武汉市武汉理工大学

收稿日期:2016-11-25 修回日期:2016-12-21 发布日期:2016-12-21
通讯作者: 刘备

Research on Dynamic Resource Allocation strategy in Spark Streaming

Received:2016-11-25 Revised:2016-12-21 Online:2016-12-21
Contact: LiuBei

摘要/Abstract

摘要： 随着科技的发展，SaaS模式平台大数据应用复杂多样，可能同时包含不同特征的数据和计算，这种情况下单一的计算模式难以满足多个应用的需求，而多种计算模式的混搭必将伴随着数据转储和使用维护难度大等问题。Spark提供的内存计算引擎包含了多种经典计算模式包括流计算、批处理计算、图计算等，故而越来越多的企业采用Spark作为混合大数据计算平台。而当Spark Streaming在流处理中应用深入，往往面临着数据流实时性、动态变化和海量性等特征以及多用户多应用下任务处理的延时要求不同带来的挑战，而现有两种动态资源分配策略不能很好满足这种数据流处理挑战。针对这一问题，提出了一种多应用下的动态资源分配模型，该方法基于历史数据反馈，动态调整计算资源，以应对数据流变化和多用户不同需求，实验测试结果表明，该方法能够有效调整资源配额，减低处理延时

关键词: Spark, 实时数据流, 多应用, 动态资源分配

Abstract: With the development of science and technology, large data applications on SaaS model platform are complex and diverse, may also contain different characteristics of the data and computing, in which case a single computing model is difficult to meet the needs of multiple applications, and mashups for multiple computing modes will greatly increase the difficulty of data dumping, using and maintenance. Spark provides a memory computing engine which contains a variety of classic computing models, including streaming computing, batch computing, graph computing and so on. so more and more enterprises choose Spark as a hybrid large data computing platform. In application of Spark Streaming in stream processing, some challenges need to be often faced, which are the characteristics of data flow real-time, dynamic changing and mass and the different delay requirement of multi-user task processing. But, two current dynamic resource allocation strategies are not well to achieve the data flow processing. Aiming at this problem, this paper proposes a dynamic resource allocation model based on multi-applications, which is based on historical data feedback and dynamic adjustment of computing resources to deal with data flow changes and multi-user different requirements. Experimental results show that this method can effectively adjust the resource quota, and reduce processing delay.

Key words: Spark, real-time data stream, multi-applications, dynamic resource allocation

中图分类号:

TP311.5

刘备谭新明曹文彬. Spark Streaming动态资源分配策略研究[J]. 计算机应用.

[1]	吴仁彪, 张振驰, 贾云飞, 乔晗. 云平台下基于截止时间的自适应调度策略[J]. 《计算机应用》唯一官方网站, 2023, 43(1): 176-184.
[2]	冯钧, 王秉发, 陆佳民. 分布式资源描述框架数据管理系统查询性能评价[J]. 《计算机应用》唯一官方网站, 2022, 42(2): 440-448.
[3]	顾军华, 王锋, 戚永军, 孙哲然, 田泽培, 张亚娟. 基于多尺度卷积特征融合的肺结节图像检索方法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 561-565.
[4]	刘斌, 何进荣, 李远成, 韩宏. 基于分布式神经网络的苹果价格预测方法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 369-374.
[5]	章夏杰, 朱敬华, 陈杨. Spark下的分布式粗糙集属性约简算法[J]. 《计算机应用》唯一官方网站, 2020, 40(2): 518-523.
[6]	程文亮, 王志宏, 周虞, 过弋, 赵俊锋. 面向外汇市场监测的分布式计算框架设计[J]. 计算机应用, 2020, 40(1): 173-180.
[7]	崔艺馨, 陈晓东. Spark框架优化的大规模谱聚类并行算法[J]. 计算机应用, 2020, 40(1): 168-172.
[8]	安鑫, 张影, 康安, 陈田, 李建华. 基于机器学习的异构多核处理器系统在线映射方法[J]. 计算机应用, 2019, 39(6): 1753-1759.
[9]	刘靖, 肖冠烽. 基于Spark与粒子滤波算法的公交到站时间预测系统[J]. 计算机应用, 2019, 39(2): 429-435.
[10]	刘子豪, 李凌, 叶枫. 基于SparkR的水文传感器数据的异常检测方法[J]. 计算机应用, 2019, 39(2): 436-440.
[11]	李龙洋, 董一鸿, 施炜杰, 潘剑飞. SQM:基于Spark的大规模单图上的子图匹配算法[J]. 计算机应用, 2019, 39(1): 46-50.
[12]	赵文芳, 王京丽, 尚敏, 刘亚楠. 基于粒子群优化和支持向量机的花粉浓度预测模型[J]. 计算机应用, 2019, 39(1): 98-104.
[13]	崔晨, 郑林江, 韩凤萍, 何牧君. 基于内存的HBase二级索引设计[J]. 计算机应用, 2018, 38(6): 1584-1590.
[14]	顾军华, 霍士杰, 武君艳, 尹君, 张素琪. 求解最大团问题的并行多层图划分方法[J]. 计算机应用, 2018, 38(12): 3425-3432.
[15]	付眸, 杨贺昆, 吴唐美, 何润, 冯朝胜, 康胜. 基于Spark Streaming的快速视频转码方法[J]. 计算机应用, 2018, 38(12): 3500-3508.