计算机应用 ›› 2018, Vol. 38 ›› Issue (12): 3481-3489.DOI: 10.11772/j.issn.1001-9081.2018040741

• 先进计算 • 上一篇    下一篇

Storm环境下基于拓扑结构的任务调度策略

刘粟, 于炯, 鲁亮, 李梓杨   

  1. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046
  • 收稿日期:2018-04-11 修回日期:2018-06-18 出版日期:2018-12-10 发布日期:2018-12-15
  • 通讯作者: 于炯
  • 作者简介:刘粟(1994-),女,吉林吉林人,硕士研究生,CCF会员,主要研究方向:分布式计算、内存计算;于炯(1964-),男,北京人,教授,博士生导师,博士,CCF会员,主要研究方向:网格计算、分布式计算;鲁亮(1990-),男,湖南湘潭人,博士,CCF会员,主要研究方向:云计算、分布式计算、内存计算;李梓杨(1993-),男,新疆乌鲁木齐人,博士研究生,CCF会员,主要研究方向:云计算、分布式计算。
  • 基金资助:
    国家自然科学基金资助项目(61462079,61562078,61562086);国家科技支撑项目(2015BAH02F01)。

Task scheduling strategy based on topology structure in Storm

LIU Su, YU Jiong, LU Liang, LI Ziyang   

  1. College of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China
  • Received:2018-04-11 Revised:2018-06-18 Online:2018-12-10 Published:2018-12-15
  • Contact: 于炯
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61462079, 61562078, 61562086), the National Science and Technology Support Program (2015BAH02F01).

摘要: 针对Storm流式计算平台中默认轮询调度策略存在通信开销大、负载不均衡的问题,提出基于拓扑结构的任务调度策略(TS2)。首先,选取CPU资源充足且可用的工作节点并各分配一个进程,消除节点内进程间通信开销,优化进程部署;然后,分析拓扑结构,找出拓扑中度最大的组件,优先分配该组件的线程;最后,在满足节点可承载最大线程数的条件下,尽可能将关联任务部署到同一个节点来减少节点间通信开销,改善集群负载均衡,优化线程部署。实验结果表明:在系统延迟方面,与Storm默认调度策略和离线调度策略相比,TS2的平均优化率分别为16.91%和5.69%,有效提高了系统的实时性;在节点间通信开销方面,TS2相比于Storm默认调度策略平均降低了15.75%;在平均吞吐量方面,TS2相比于Storm默认调度策略平均提升了14.21%。

关键词: Storm, 流式计算, 任务调度, 拓扑结构, 通信开销

Abstract: In order to solve the problems of large communication cost and unbalanced load in the default round-robin scheduling strategy of Storm stream computing platform, a Task Scheduling Strategy based on Topology Structure (TS2) in Storm was proposed. Firstly, the work nodes with sufficient and available Central Processing Unit (CPU) resources were selected and only a process was allocated to each work node to eliminate the communication cost between processes within the nodes and optimize the process deployment. Then, the topology structure was analyzed, the component with the biggest degree in the topology was found and the thread of the component was assigned with the highest priority. Finally, under the condition of the maximum number of threads that a node could carry, the associated tasks were deployed to the same node as far as possible to reduce the communication cost between nodes, improve the load balance of cluster and optimize the thread deployment. The experimental results show that, in terms of system latency, the average optimization rate of TS2 is 16.91% and 5.69% respectively compared with Storm default scheduling strategy and offline scheduling strategy, which effectively improves the real-time performance of system. Additionally, compared with the Storm default scheduling strategy, the communication cost between nodes of TS2 is reduced by 15.75% and its average throughput is improved by 14.21%.

Key words: Storm, stream computing, task scheduling, topology structure, communication cost

中图分类号: