计算机应用 ›› 2017, Vol. 37 ›› Issue (10): 2760-2766.DOI: 10.11772/j.issn.1001-9081.2017.10.2760

• 先进计算 • 上一篇    下一篇

基于负载感知的数据流动态负载均衡策略

李梓杨1, 于炯1,2, 卞琛2, 王跃飞2, 鲁亮2   

  1. 1. 新疆大学 软件学院, 乌鲁木齐 830008;
    2. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046
  • 收稿日期:2017-04-25 修回日期:2017-06-19 出版日期:2017-10-10 发布日期:2017-10-16
  • 通讯作者: 于炯(1964-),男,北京人,教授,博士生导师,博士,CCF高级会员,主要研究方向:网络安全、网格计算、分布式计算,E-mail:yujiong@xju.edu.cn
  • 作者简介:李梓杨(1993-),男,新疆乌鲁木齐人,硕士研究生,CCF会员,主要研究方向:云计算、分布式计算;于炯(1964-),男,北京人,教授,博士生导师,博士,CCF高级会员,主要研究方向:网络安全、网格计算、分布式计算;卞琛(1981-),男,江苏南京人,副教授,博士,CCF会员,主要研究方向:网络计算、分布式系统;王跃飞(1991-),男,新疆乌鲁木齐人,博士研究生,主要研究方向:云计算、分布式计算、数据挖掘;鲁亮(1990-),男,新疆乌鲁木齐人,博士研究生,CCF会员,主要研究方向:云计算、分布式计算、内存计算.
  • 基金资助:
    国家自然科学基金资助项目(61262088,61462079,61562086,61363083);新疆维吾尔自治区高校科研计划项目(XJEDU2016S106)。

Dynamic data stream load balancing strategy based on load awareness

LI Ziyang1, YU Jiong1,2, BIAN Chen2, WANG Yuefei2, LU Liang2   

  1. 1. School of Software, Xinjiang University, Urumqi Xinjiang 830008, China;
    2. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China
  • Received:2017-04-25 Revised:2017-06-19 Online:2017-10-10 Published:2017-10-16
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61262088, 61462079, 61562086, 61363083), the Educational Research Program of Xinjiang Uygur Autonomous Region (XJEDU2016S106).

摘要: 针对大数据流式计算平台中存在节点间负载不均衡、节点性能评估不全面的问题,提出基于负载感知算法的动态负载均衡策略,并将算法应用于Flink数据流计算平台中。首先通过有向无环图的深度优先搜索算法获取节点的计算延迟时间作为评估节点性能的依据,并制定负载均衡策略;然后基于数据分块管理策略实现流式数据的节点间负载迁移技术,通过反馈实现全局和局部的负载调优;最后通过实验评估时空代价论证算法的可行性,并讨论重要参数对算法执行效果的影响。经实验验证算法通过优化流式计算任务的负载分配提高了任务的执行效率,与采用Flink平台现有的负载均衡策略相比,任务执行时间平均缩短6.51%。

关键词: 数据流, 负载均衡, 深度优先搜索, 负载感知, Apache Flink

Abstract: Concerning the problem of unbalanced load and incomplete comprehensive evaluation of nodes in big data stream processing platform, a dynamic load balancing strategy based on load awareness algorithm was proposed and applied to a data stream processing platform named Apache Flink. Firstly, the computational delay time of the nodes was obtained by using the depth-first search algorithm for the Directed Acyclic Graph (DAG) and regarded as the basis for evaluating the performance of the nodes, and the load balancing strategy was created. Secondly, the load migration technology for data stream was implemented based on the data block management strategy, and both the global and local load optimization was implemented through feedback. Finally, the feasibility of the algorithm was proved by evaluating its time-space complexity, meanwhile the influence of important parameters on the algorithm execution was discussed. The experimental results show that the proposed algorithm increases the efficiency of the task execution by optimizing the load sharing between nodes, and the task execution time is shortened by 6.51% averagely compared with the traditional load balancing strategy of Apache Flink.

Key words: data stream, load balancing, depth-first search, load awareness, Apache Flink

中图分类号: