计算机应用 ›› 2010, Vol. 30 ›› Issue (8): 2260-2065.

• 先进计算 • 上一篇    下一篇

HDFS下载效率的优化

曹宁1,吴中海2,刘宏志3,张齐勋3   

  1. 1. 北京大学
    2. 北京大学软件与微电子学院
    3.
  • 收稿日期:2010-01-18 修回日期:2010-03-04 发布日期:2010-07-30 出版日期:2010-08-01
  • 通讯作者: 曹宁

Improving downloading performance in hadoop distributed file system

  • Received:2010-01-18 Revised:2010-03-04 Online:2010-07-30 Published:2010-08-01
  • Contact: Ning Cao

摘要: 针对HDFS的内部数据下载效率较低和可能出现的负载不均衡的问题进行了研究,从分布式文件整体下载效率和数据块的下载效率两方面提出了优化方法。实验结果表明:两个方法都能提高效率,但在集群有大量DataNode的前提下,两者结合起来的方法能更好地提高下载效率和均衡DataNode的负载

关键词: 云计算, Hadoop档案系统(HDFS), 多线程, 并行下载

Abstract: Concerning the problems such as low downloading efficiency and imbalanced load of DataNode in Hadoop Distributed File System(HDFS). Inthis paper two methods to optimize were proposed, one was to improve the whole process of downloading a file, the other was to optimize the downloading a block by a parallel download algorithm for dynamically allocating load by speed. Mathematical analysis and experiments prove that two methods can enhance the efficiency. Meanwhile, by combining the two methods, downloading is more efficient, and more stable when the load of DataNode can be balanced to some extent.

Key words: cloud computing, Hadoop Distributed File System (HDFS), multi-thread, parallel downloading