计算机应用 ›› 2016, Vol. 36 ›› Issue (6): 1526-1532.DOI: 10.11772/j.issn.1001-9081.2016.06.1526

• 先进计算 • 上一篇    下一篇

基于内存云的大块数据对象并行存取策略

褚征1, 于炯1,2, 鲁亮2, 英昌甜2, 卞琛2, 王跃飞1   

  1. 1. 新疆大学 软件学院, 乌鲁木齐 830008;
    2. 新疆大学 信息科学与工程学院, 乌鲁木齐 830046
  • 收稿日期:2015-12-21 修回日期:2016-02-28 出版日期:2016-06-10 发布日期:2016-06-08
  • 通讯作者: 于炯
  • 作者简介:褚征(1991-),男,新疆乌鲁木齐人,硕士研究生,CCF会员,主要研究方向:云存储、内存计算、绿色计算;于炯(1964-),男,北京人,教授,博士,博士生导师,CCF会员,主要研究方向:网络安全、网格计算、分布式计算;鲁亮(1990-),男,新疆乌鲁木齐人,博士研究生,CCF会员,主要研究方向:内存计算、绿色计算;英昌甜(1989-),女,新疆乌鲁木齐人,博士研究生,主要研究方向:分布式文件系统、内存计算;卞琛(1981-),男,新疆乌鲁木齐人,博士研究生,CCF会员,主要研究方向:高性能计算、云计算、内存计算;王跃飞(1991-),男,新疆乌鲁木齐人,硕士研究生,主要研究方向:云计算、分布式计算。
  • 基金资助:
    国家自然科学基金资助项目(61462079,61262088,61562086,61363083)。

Parallel access strategy for big data objects based on RAMCloud

CHU Zheng1, YU Jiong1,2, LU Liang2, YING Changtian2, BIAN Chen2, WANG Yuefei1   

  1. 1. School of Software, Xinjiang University, Urumqi Xinjiang 830008, China;
    2. School of Information Science and Engineering, Xinjiang University, Urumqi Xinjiang 830046, China
  • Received:2015-12-21 Revised:2016-02-28 Online:2016-06-10 Published:2016-06-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61462079,61262088,61562086,61363083).

摘要: 由于内存云(RAMCloud)只支持最大1 MB的小块数据对象存储,因此当大于1 MB的对象需要存储在内存云集群中就会受到对象大小的限制,无法在集群中进行存储。为了解决内存云存储限制的问题,提出了基于内存云的大块数据对象并行存取策略。该存储策略首先将大块数据对象分割成若干个1 MB的小块数据对象,然后在客户端生成数据摘要,最后使用并行存储算法将客户端分割成的小块数据对象存储在内存云集群中。读取时首先读取数据摘要,然后根据数据摘要从内存云集群中并行读取小块数据对象,并将小块数据对象合并生成大块数据对象。实验结果表明:大块数据对象的并行存取策略在不破坏内存云集群体系结构的前提下存储时间为16~18 μs,读取时间为6~7 μs。在InfiniBand网络架构下,所提并行算法的加速比呈现类似线性的增长,它使大块数据对象也能够像小块数据对象一样在微秒级别下快速、高效地进行存取。

关键词: 云存储, 内存云, 大块数据对象, 存储策略, 并行算法

Abstract: RAMCloud only supports the small object storage which is not larger than 1 MB. When the object which is larger than 1 MB needs to be stored in the RAMCloud cluster, it will be constrained by the object's size. So the big data objects can not be stored in the RAMCloud cluster. In order to resolve the storage limitation problem in RAMCloud, a parallel access strategy for big data objects based on RAMCloud was proposed. Firstly, the big data object was divided into several small data objects within 1 MB. Then the data summary was created in the client. The small data objects which were divided in the client were stored in RAMCloud cluster by the parallel access strategy. On the stage of reading, the data summary was firstly read, and then the small data objects were read in parallel from the RAMCloud cluster according to the data summary. Then the small data objects were merged into the big data object. The experimental results show that, the storage time of the proposed parallel access strategy for big data objects can reach 16 to 18 μs and the reading time can reach 6 to 7 μs without destroying the architecture of RAMCloud cluster. Under the InfiniBand network framework, the speedup of the proposed paralled strategy almost increases linearly, which can make the big data objects access rapidly and efficiently in microsecond level just like small data objects.

Key words: cloud storage, RAMCloud, big data object, storing strategy, parallel algorithm

中图分类号: