计算机应用 ›› 2015, Vol. 35 ›› Issue (11): 3097-3101.DOI: 10.11772/j.issn.1001-9081.2015.11.3097

• 2015年全国开放式分布与并行计算学术年会(DPCS 2015)论文 • 上一篇    下一篇

基于HBase的地理分布副本管理机制

李勇, 吴立慧, 黄宁, 吴维刚   

  1. 中山大学 数据科学与计算机学院, 广州 510000
  • 收稿日期:2015-06-17 修回日期:2015-07-17 发布日期:2015-11-13
  • 通讯作者: 吴维刚(1976-),男,山东泰安人,副教授,博士,CCF会员,主要研究方向:地理分布的数据中心数据副本管理、自组织网络、移动计算.
  • 作者简介:李勇(1992-),男,安徽凤阳人,硕士研究生,主要研究方向:地理分布的数据中心数据副本管理; 吴立慧(1991-),女,广东梅县人,硕士研究生,主要研究方向:地理分布的数据中心数据副本管理; 黄宁(1992-),男,广东湛江人,硕士研究生,主要研究方向:地理分布的数据中心数据副本管理.
  • 基金资助:
    国家自然科学基金资助项目(61379157).

Geographically distributed replication management based on Hbase

LI Yong, WU Lihui, HUANG Ning, WU Weigang   

  1. School of Data and Computer Science, Sun Yat-sen University, Guangzhou Guangdong 510000, China
  • Received:2015-06-17 Revised:2015-07-17 Published:2015-11-13

摘要: 针对分布式存储系统中数据通常在多个数据中心有冗余的副本进行备份,需要健壮的机制维护各个副本的一致性,对分布式系统的副本复制理论作了深入研究后,提出了一套管理地理分布副本的算法.微软研究院提出服务等级协议,把用户对一致性的要求分成若干级别,每个级别与用户可容忍的延迟有关.系统保证在可容忍的延迟范围内,用户能拥有较高的服务等级.Tuba系统拓展了Pileus,允许系统根据所有用户发送的统计信息动态地改变主从副本存放的位置,以提高系统的平均性能,但Tuba系统的复制只是基于单个目标单位进行.对Tuba系统中的方法作出改进,提出了一套改变主从副本存放位置的算法,并在HBase分布式系统的副本复制中实现了该机制.系统完成后,通过实验验证了在改变主从副本存放位置时综合考虑两个region的相关性可以提高系统整体的效用.

关键词: 分布式系统, 一致性, 服务等级协议, 复制, 地理分布

Abstract: Concerning the problem that the data in distributed system usually has many replicas among several datacenters and a robust mechanism was required to maintain data consistency, an algorithm of geographically distributed replication management was proposed after further research on the replication theory of distributed systems. Microsoft Research used Service Level Agreements (SLA) to divide the consistency requirements of users into several levels, each of which was associated with tolerable delay. The system ensured that users could have higher service levels within tolerable delay. Tuba system extends Pileus, it can dynamically change the location of primary and secondary replicas according to statistics sent by all users, so as to raise the average performance of the system. But the replication of Tuba system was carried out based on a single target unit. Improving the method in Tuba system, a set of algorithms independently to change the location of primary and secondary replicas was proposed. The mechanism was implemented in the replication among the HBase distributed systems. After the system is completed, the results show that taking the correlation between two regions into consideration when changing the location of primary and secondary replicas can improve the overall utility of the system.

Key words: distributed system, consistency, Service Level Agreement (SLA), replication, geographically distributed

中图分类号: