• •    

Spark Streaming动态资源分配策略研究

刘备,谭新明,曹文彬   

  1. 湖北省武汉市武汉理工大学
  • 收稿日期:2016-11-25 修回日期:2016-12-21 发布日期:2016-12-21
  • 通讯作者: 刘备

Research on Dynamic Resource Allocation strategy in Spark Streaming

  • Received:2016-11-25 Revised:2016-12-21 Online:2016-12-21
  • Contact: LiuBei

摘要: 随着科技的发展,SaaS模式平台大数据应用复杂多样,可能同时包含不同特征的数据和计算,这种情况下单一的计算模式难以满足多个应用的需求,而多种计算模式的混搭必将伴随着数据转储和使用维护难度大等问题。Spark提供的内存计算引擎包含了多种经典计算模式包括流计算、批处理计算、图计算等,故而越来越多的企业采用Spark作为混合大数据计算平台。而当Spark Streaming在流处理中应用深入,往往面临着数据流实时性、动态变化和海量性等特征以及多用户多应用下任务处理的延时要求不同带来的挑战,而现有两种动态资源分配策略不能很好满足这种数据流处理挑战。针对这一问题,提出了一种多应用下的动态资源分配模型,该方法基于历史数据反馈,动态调整计算资源,以应对数据流变化和多用户不同需求,实验测试结果表明,该方法能够有效调整资源配额,减低处理延时

关键词: Spark, 实时数据流, 多应用, 动态资源分配

Abstract: With the development of science and technology, large data applications on SaaS model platform are complex and diverse, may also contain different characteristics of the data and computing, in which case a single computing model is difficult to meet the needs of multiple applications, and mashups for multiple computing modes will greatly increase the difficulty of data dumping, using and maintenance. Spark provides a memory computing engine which contains a variety of classic computing models, including streaming computing, batch computing, graph computing and so on. so more and more enterprises choose Spark as a hybrid large data computing platform. In application of Spark Streaming in stream processing, some challenges need to be often faced, which are the characteristics of data flow real-time, dynamic changing and mass and the different delay requirement of multi-user task processing. But, two current dynamic resource allocation strategies are not well to achieve the data flow processing. Aiming at this problem, this paper proposes a dynamic resource allocation model based on multi-applications, which is based on historical data feedback and dynamic adjustment of computing resources to deal with data flow changes and multi-user different requirements. Experimental results show that this method can effectively adjust the resource quota, and reduce processing delay.

Key words: Spark, real-time data stream, multi-applications, dynamic resource allocation

中图分类号: