• •    

信息存储技术学术会议+42 基于RAMCloud的大文件存储方法的研究与实现

刘钢标,张延园,林奕,樊鑫,邢新疆   

  1. 西北工业大学
  • 收稿日期:2016-11-25 修回日期:2016-11-29 发布日期:2016-11-29
  • 通讯作者: 刘钢标

Research and Implementation of Large File Storage Based on RAMCloud

  • Received:2016-11-25 Revised:2016-11-29 Online:2016-11-29
  • Contact: Gang-Biao LIU

摘要: 摘 要:内存云(RAMCloud)是一个新型的基于内存的分布式键值(Key-Value)存储系统,它通过高速网络,将数据中心的各个服务器的可利用内存整合起来进行统一管理,利用磁盘实现数据的持久化存储。它能够有效地支持在线大规模小数据的存取与访问,但是并不支持大文件的直接存取操作。针对为了解决上述问题,本文提出了一个基于文件分割和合并的解决方案,在此基础之上,设计和实现了一个大文件管理模块,并将此模块集成到RAMCloud中。通过实验对改进后的系统进行了性能测试,实验结果表明,该系统能够有效地支持大文件的存取,且与HDFS相比,读写速度均有着明显的性能优势。在后续工作中,考虑将此系统集成到HDFS中当作分布式缓存管理系统来使用,从而达到提高HDFS性能的目的。

关键词: 大文件存储, RAMCloud, 分布式存储系统, 访问延迟

Abstract: Abstract: The RAMCloud is a new kind of distributed key-value storage system that keeps data entirely in DRAM. It aggregates the available memories of thousands of servers by a high-speed network. RAMCloud is a storage system that provides low-latency access to large-scale datasets of small object, but it’s not suitable for large files (tens of MBs or above). In this paper, we proposed a solution to solve this problem based on file splitting and file block merging. We designed and implemented a big file management module, then integrate it into RAMCloud. Experimental result shows that the improved system can support the storage large file efficiently and has a higher read and write performance compared with HDFS. In subsequent work, the system will be integrated into HDFS as a distributed cache management system, so as to improve the performance of HDFS.

Key words: large file storage, RAMCloud, distributed storage system, access latency

中图分类号: