计算机应用 ›› 2021, Vol. 41 ›› Issue (3): 630-635.DOI: 10.11772/j.issn.1001-9081.2020091543

所属专题: 第37届CCF中国数据库学术会议(NDBC 2020)

• 第37届CCF中国数据库学术会议(NDBC 2020) • 上一篇    下一篇

基于日志结构合并树的轻量级分布式索引实现方法

崔双双, 王宏志   

  1. 哈尔滨工业大学 计算学部, 哈尔滨 150001
  • 收稿日期:2020-09-07 修回日期:2020-12-10 出版日期:2021-03-10 发布日期:2020-12-31
  • 通讯作者: 王宏志
  • 作者简介:崔双双(1997-),女,黑龙江哈尔滨人,硕士研究生,CCF会员,主要研究方向:数据库、查询优化;王宏志(1978-),男,辽宁丹东人,教授,博士生导师,博士,主要研究方向:数据库、大数据、数据质量。
  • 基金资助:
    国家重点研发计划项目(2018YFB1004700);国家自然科学基金资助项目(U1866602,61602129,61772157)。

Implementation method of lightweight distributed index based on log structured merge-tree

CUI Shuangshuang, WANG Hongzhi   

  1. Faculty of Computer Science, Harbin Institute of Technology, Harbin Heilongjiang 150001, China
  • Received:2020-09-07 Revised:2020-12-10 Online:2021-03-10 Published:2020-12-31
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2018YFB1004700), the National Natural Science Foundation of China (U1866602, 61602129, 61772157).

摘要: 针对现有基于日志结构合并树(LSM-Tree)实现的分布式数据库仅支持高效的主键查询,无法让用户快速地应用在自己的集群中的问题,提出了基于LSM-Tree的轻量级分布式索引实现方法SIBL。首先,通过对主键属性列建立索引来提高非主键属性的查询效率;然后,提出了分布式索引构建算法以及基于等距取样的索引区间划分算法,从而保证了索引在系统中的均匀分布,并且优化了传统索引的查询算法,将索引文件看作特殊的数据文件分布式地存储在系统中,从而保证了系统的负载均衡和可扩展性;最后,将该方法与华为二级索引方案HIndex在HBase数据库上进行实验来比较二者的索引构建的时间和空间开销、索引的查询性能和系统的负载均衡等性能,验证得出所提出的方法使查询性能提升了50~200倍。

关键词: 日志结构合并树, 分布式索引, HBase, 查询优化

Abstract: To solve the problem that the existing distributed database based on Log Structured Merge-Tree (LSM-Tree) only supports efficient primary key query and cannot allow users to quickly apply it in their own clusters, a light-weight distributed index implementation method based on LSM-Tree, called SIBL (Secondary Index Based LSM-Tree), was proposed. Firstly, the query efficiency of the non-primary key attributes was improved by indexing the primary key attribute columns. Then, a distributed index construction algorithm and an index interval division algorithm based on equidistant sampling were proposed to ensure the even distribution of indexes in the system. And the query algorithm of the traditional index was optimized, and the index file was regarded as a special data file and stored in the system in a distributed manner, ensuring the load balance and scalability of the system. Finally, experiments of the proposed method with Huawei's secondary index scheme HIndex were carried out on the HBase database to compare performances such as time and space overhead of index construction, index query performance and system load balance, verifying that the proposed method improves query performance by 50 to 200 times.

Key words: Log Structured Merge-Tree (LSM-Tree), distributed index, HBase, query optimization

中图分类号: