Journal of Computer Applications ›› 2020, Vol. 40 ›› Issue (2): 410-415.DOI: 10.11772/j.issn.1001-9081.2019081908

• CCF NDBC 2019 • Previous Articles     Next Articles

Accessing optimization with multiple indexes in TiDB

Hai LAN1, Ke HAN1, Li SHEN2, Qiu CUI2, Yuwei PENG1()   

  1. 1.School of Computer Science,Wuhan University,Wuhan Hubei 610072,China
    2.PingCAP,Beijing 100096,China
  • Received:2019-10-18 Revised:2019-11-20 Accepted:2019-11-22 Online:2019-12-04 Published:2020-02-10
  • Contact: Yuwei PENG
  • About author:LAN Hai, born in 1993, M. S. candidate. His research interests include database system, distributed system.
    HAN Ke, born in 1995, M. S. candidate. His research interests include massive parallel processing database, genealogical data management.
    SHEN Li, born in 1985, M. S. His research interests include database, distributed system.
    CUI Qiu, born in 1986, M. S. His research interests include distributed database.
  • Supported by:
    the National Key Research and Development Program of China(2016YFB1000701)

TiDB的多索引访问优化

兰海1, 韩珂1, 申砾2, 崔秋2, 彭煜玮1()   

  1. 1.武汉大学 计算机学院,武汉 610072
    2.北京平凯星辰科技发展有限公司,北京 100096
  • 通讯作者: 彭煜玮
  • 作者简介:兰海(1993—),男,四川成都人,硕士研究生,CCF会员,主要研究方向:数据库系统、分布式系统
    韩珂(1995—),男,河南南阳人,硕士研究生,主要研究方向:大规模并行处理数据库、族谱数据管理
    申砾(1985—),男,河北石家庄人,硕士研究生,主要研究方向:数据库、分布式系统
    崔秋(1986—),男,天津人,硕士研究生,主要研究方向:分布式数据库;
  • 基金资助:
    国家重点研发计划项目(2016YFB1000701)

Abstract:

When the query condition involves multiple indexed attributes, TiDB cannot effectively use multiple indexes to generate a more efficient execution plan. After studying the existing solutions of databases, such as PostgreSQL and MySQL, a new type of data access path using multiple indexes simultaneously was proposed in TiDB to solve the problem, namely MultiIndexPath. Firstly, a possible access path named MultiIndexPath for a certain query was generated and its physical plan named MultiIIndexPlan was created, then the cost of this plan was calculated. Secondly, the general execution framework of MultiIndexPath was proposed after combining the architecture and implementation of TiDB. Finally, the Pipeline execution plan was proposed with the condition of conjunctive normal form. The whole work was implemented based on TiDB 3.0 and several experiments were conducted. Experimental results show that the performance of the proposed scheme is improved by at least one order of magnitude compared with the original TiDB when the condition is the disjunctive normal form. With the condition of conjunctive normal form, the performance of the scheme is also better than that of the original TiDB.

Key words: distributed database, TiDB, optimizer, index

摘要:

当查询条件涉及多个已建立索引的属性时,TiDB不能利用多个索引产生更优的执行计划。为了解决此问题,在研究现有数据库解决方案(如PostgreSQL和MySQL等)后,在TiDB中提出一种同时利用多个索引的新类型数据访问路径,称为MultiIndexPath。首先,设计算法生成一个查询可能的MultiIndexPath,并产生该路径的物理计划MultiIIndexPlan,然后计算物理计划的代价;其次,结合TiDB的架构与实现,提出MultiIndexPlan的通用执行框架;最后,当条件为合取范式时,提出Pipeline执行方案。整个工作基于TiDB 3.0实现并进行若干实验,结果表明:当条件为析取范式时,所提方案的性能比原TiDB至少有一个数量级提升;当条件为合取范式时,性能也优于原TiDB。

关键词: 分布式数据库, TiDB, 优化器, 索引

CLC Number: