计算机应用 ›› 2014, Vol. 34 ›› Issue (8): 2260-2266.DOI: 10.11772/j.issn.1001-9081.2014.08.2260

• 先进计算 • 上一篇    下一篇

数据依赖约束下的任务调度资源选择算法

廖彬1,于炯2,张陶3,杨兴耀2   

  1. 1. 新疆财经大学 统计与信息学院,乌鲁木齐830012;
    2. 新疆大学 软件学院,乌鲁木齐830008;
    3. 新疆医科大学 医学工程技术学院,乌鲁木齐830011;
  • 收稿日期:2014-02-12 修回日期:2014-04-15 出版日期:2014-08-01 发布日期:2014-08-10
  • 通讯作者: 廖彬
  • 作者简介:廖彬(1986-),男,四川内江人,博士研究生,主要研究方向:数据库、云计算、绿色计算;于炯(1964-),男,北京人,教授,博士生导师,博士,主要研究方向:网络安全、网格与分布式计算;张陶(1988-),女,新疆乌鲁木齐人,硕士,主要研究方向:分布式计算、网格计算;杨兴耀(1984-),男,新疆乌鲁木齐人,博士研究生,主要研究方向:分布式计算、网格计算与推荐系统。
  • 基金资助:

    国家自然科学基金资助项目;新疆维吾尔自治区自然科学基金资助项目

Task scheduling and resource selection algorithm with data-dependent constraints

LIAO Bin1,YU Jiong2,ZHANG Tao3,YANG Xingyao2   

  1. 1. College of Statistics and Information, Xinjiang University of Finance and Economics, Urumqi Xinjiang 830012, China;
    2. School of Software, Xinjiang University, Urumqi Xinjiang 830008, China;
    3. Medical Engineering College, Xinjiang Medical University, Urumqi Xinjiang 830011, China;
  • Received:2014-02-12 Revised:2014-04-15 Online:2014-08-01 Published:2014-08-10
  • Contact: LIAO Bin

摘要:

大数据环境下的计算任务往往具有一定数据依赖性关系(如MapReduce),现有的分布式存储系统任务资源选择策略选择离请求者最近的数据块响应服务,忽略了对数据块所在服务器CPU、磁盘I/O与网络等资源负载状态的考虑。在分析研究系统集群结构、文件分块、数据块存储机制的基础上,定义了集群节点矩阵、CPU负载矩阵、磁盘I/O负载矩阵、网络负载矩阵、文件分块矩阵、数据块存储矩阵与数据块存储节点状态矩阵,为任务与数据之间的依赖性构建了基础数据模型,提出了一种数据依赖约束下的最优资源选择算法(ORS2DC)。任务调度节点负责维护基础数据,MapReduce任务与数据块读取任务由于依赖资源不同而采取不同的选择策略。实验结果表明:所提算法能够为任务选择质量更高的资源,提高任务完成质量的同时减轻了NameNode负担,减小了单点故障发生的概率。

Abstract:

Like MapReduce, tasks under big data environment are always with data-dependent constraints. The resource selection strategy in distributed storage system trends to choose the nearest data block to requestor, which ignored the server's resource load state, like CPU, disk I/O and network, etc. On the basis of the distributed storage system's cluster structure, data file division mechanism and data block storage mechanism, this paper defined the cluster-node matrix, CPU load matrix, disk I/O load matrix, network load matrix, file-division-block matrix, data block storage matrix and data block storage matrix of node status. These matrixes modeled the relationship between task and its data constraints. And the article proposed an optimal resource selection algorithm with data-dependent constraints (ORS2DC), in which the task scheduling node is responsible for base data maintenance, MapRedcue tasks and data block read tasks take different selection strategies with different resource-constraints. The experimental results show that, the proposed algorithm can choose higher quality resources for the task, improve the task completion quality while reducing the NameNode's load burden, which can reduce the probability of the single point of failure.

中图分类号: