Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (2): 478-482.DOI: 10.11772/j.issn.1001-9081.2017081916

Previous Articles     Next Articles

Query optimization based on Greenplum database

ZOU Chengming1,2, XIE Yi1,2, WU Pei1,2   

  1. 1. Hubei Key Laboratory of Transportation Internet of Things(Wuhan University of Technology), Wuhan Hubei 430070, China;
    2. School of Computer Science and Technology, Wuhan University of Technology, Wuhan Hubei 430070, China
  • Received:2017-07-31 Revised:2017-09-08 Online:2018-02-10 Published:2018-02-10
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61503289), the Science and Technology Support Program of Hubei Province (2015BAA120, 2015BCE068).

基于Greenplum数据库的查询优化

邹承明1,2, 谢义1,2, 吴佩1,2   

  1. 1. 交通物联网技术湖北省重点实验室(武汉理工大学), 武汉 430070;
    2. 武汉理工大学 计算机科学与技术学院, 武汉 430070
  • 通讯作者: 谢义
  • 作者简介:邹承明(1975-),男,广东徐闻人,教授,博士,CCF会员,主要研究方向:计算机视觉、嵌入式系统、软件理论与方法;谢义(1991-),男,湖南邵东人,硕士研究生,主要研究方向:软件定义、数据迁移;吴佩(1993-),女,湖北武汉人,硕士研究生,主要研究方向:图形图像处理。
  • 基金资助:
    国家自然科学基金资助项目(61503289);湖北省科技支撑计划项目(2015BAA120,2015BCE068)。

Abstract: In order to solve the problem that the query efficiency of distributed database decreases with the increase of data scale, the Greenplum distributed database was taken as the research object, and a cost-based optimal query plan generation scheme was proposed from the perspective of optimizing the query path. Firstly, an effective cost model was designed to estimate the query cost. The parallel maximum and minimum ant colony algorithm was then used to search the join order with the minimum query cost, i.e. the optimal join order. Finally, the optimal query plan was obtained based on the Greenplum database's default optimal choice for different operations in the query plan. Multiple experiments were carried out on the self-generated data set and Transaction Processing Performance Council Benchmark H (TPC-H) standard data set by using the proposed scheme. The experimental results show that the proposed optimization scheme can effectively search out the optimal solution and obtain the optimal query plan, so as to improve the query efficiency of Greenplum database.

Key words: distributed database, Greenplum database, optimal query plan, cost model, optimal join order

摘要: 针对分布式数据库查询效率随着数据规模的增大而降低的问题,以Greenplum分布式数据库为研究对象,从优化查询路径的角度提出一个基于代价的最优查询计划生成方法。首先,该方法设计一种有效的代价模型来估算查询代价;然后,采用并行最大最小蚁群算法来搜索具有最小查询代价的连接顺序,即最优连接顺序;最后,根据Greenplum数据库对查询计划中不同操作的默认最优选择得到最优查询计划。采用该方法在自主生成的数据集与事务处理性能理事会测试基准(TPC-H)的标准数据集上进行了多组实验。实验结果表明,所提出的优化方法能有效地搜索出最优解,获得最优的查询计划,从而提升Greenplum数据库的查询效率。

关键词: 分布式数据库, Greenplum数据库, 最优查询计划, 代价模型, 最优连接顺序

CLC Number: