Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (1): 1-7.DOI: 10.11772/j.issn.1001-9081.2016.01.0001

    Next Articles

Implementation of distributed index in cluster environment

WENG Haixing1, GONG Xueqing1, ZHU Yanchao1, HU Hualiang2   

  1. 1. Institute for Data Science and Engineering, East China Normal University, Shanghai 200062, China;
    2. College of Economic and Management, Zhejiang Sci-Tech University, Hangzhou Zhejiang 310018, China
  • Received:2015-09-08 Revised:2015-10-08 Online:2016-01-10 Published:2016-01-09
  • Supported by:
    This work is partially supported by the Natural Science Foundation of Zhejiang Province (LY12F02044), the National Natural Science Foundation of China (U1401256).


翁海星1, 宫学庆1, 朱燕超1, 胡华梁2   

  1. 1. 华东师范大学 数据科学与工程研究院, 上海 200062;
    2. 浙江理工大学 经济管理学院, 杭州 310018
  • 通讯作者: 胡华梁(1974-),男,江西武宁人,副教授,博士,主要研究方向:数据管理
  • 作者简介:翁海星(1993-),男,贵州都匀人,硕士研究生,主要研究方向:分布式数据库、大数据管理;宫学庆(1974-),男,上海人,教授,博士,主要研究方向:数据库、社交网络分析;朱燕超(1992-),男,江苏溧阳人,博士研究生,主要研究方向:分布式数据库。
  • 基金资助:

Abstract: For performance issues brought by using non-primary key to access data on a distributed storage system, key technologies were mainly discussed to the implementation of indexing on a distributed storage system. Based on the rich analysis of new distributed storage features, the keys to design and implementation of distributed index were presented. By combining characteristics of distributed storage system and associated indexing technologies, the organization and maintenance of index, data concurrency and other issues were described. Then, the distributed indexing mechanism on the open source version of OceanBase, which is a distributed database system, was designed and implemented. The performance tests were run on the benchmarking tool YCSB. The experimental results show that the distributed auxiliary index will degrade the system performance, but it can be controlled within 5% under different data scale because of the consideration of system features and storage characteristics. In addition, it can increase index performance by even 100% with a redundant colume way.

Key words: distributed storage, distributed index, auxiliary index, index maintenance, OceanBase

摘要: 针对分布式存储系统上使用非主键访问数据带来的性能问题,探讨在分布式存储系统上实现索引的相关关键技术。在充分分析分布式存储特征的基础上,提出了分布式索引设计和实现的关键点,并结合分布式存储系统的特点及相关的索引技术,讨论了索引的组织形式、索引的维护和数据一致性等问题;然后基于如上的分析,选择在分布式数据库系统OceanBase开源版本上,设计和实现分布式索引机制,并通过基准测试工具YCSB进行性能测试。实验结果表明,虽然辅助索引会对系统性能产生影响,但因为充分考虑了系统特征及存储特点,在不同数据规模下,该索引都能够将性能影响控制在5%以内。另外,使用冗余列的方式,能进一步将该索引的性能提升100%。

关键词: 分布式存储, 分布式索引, 辅助索引, 索引维护, OceanBase

CLC Number: