Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (7): 2017-2025.DOI: 10.11772/j.issn.1001-9081.2022071131

• The 39th CCF National Database Conference (NDBC 2022) • Previous Articles    

OmegaDB: concurrent computing framework of relational operators for heterogeneous architecture

Jinhui LAI1, Zichen XU1(), Yicheng TU2, Guolong TAN1   

  1. 1.School of Mathematics and Computer Science,Nanchang University,Nanchang Jiangxi 330031,China
    2.Jiaxing Yunbao Technology Company Limited,Jiaxing Zhejiang 314006,China
  • Received:2022-08-02 Revised:2022-08-27 Accepted:2022-09-05 Online:2023-07-20 Published:2023-07-10
  • Contact: Zichen XU
  • About author:LAI Jinhui, born in 1998, M. S. candidate. His research interests include database.
    XU Zichen, born in 1985, Ph. D., professor. His research interests include computer system architecture, energy-saving computing, high-performance computing, distributed storage, database.
    TU Yicheng, born in 1973, Ph. D., professor. His research interests include database, big data management, parallel and distributed systems.
    TAN Guolong, born in 1996, M. S. His research interests include database, Flink.

面向异构架构的关系型算子并发计算框架OmegaDB

赖锦辉1, 徐子晨1(), 涂亦澄2, 谭国龙1   

  1. 1.南昌大学 数学与计算机学院,南昌 330031
    2.嘉兴云豹科技有限公司,浙江 嘉兴 314006
  • 通讯作者: 徐子晨
  • 作者简介:赖锦辉(1998—),男,江西吉安人,硕士研究生,主要研究方向:数据库;
    徐子晨(1985—),男,江西南昌人,教授,博士,CCF会员,主要研究方向:计算机系统架构、节能计算、高性能计算、分布式存储、数据库;
    涂亦澄(1973—),男,湖南益阳人,教授,博士,主要研究方向:数据库、大数据管理、并行和分布式系统;
    谭国龙(1996—),男,江西上饶人,硕士,主要研究方向:数据库、Flink。

Abstract:

There is a possibility of overlapping access data paths and shared computation among different queries in database systems, and batch processing of queries in workloads is called Multiple-Query-at-a-Time model. Several developed multi-query processing frameworks have been proven effective, but all of them lack a general framework for building complete query processing and optimization methods. On the basis of a query time operator merging optimization framework constructed based on equivalent transformation, a relational operator concurrent computing framework for heterogeneous architectures, called OmegaDB, was proposed. In this framework, by studying the GPU-oriented relational operator flow-batch computing model, and constructing the relational data query pipeline, a flow-batch computing method aggregating multiple-query was implemented on the CPU-GPU heterogeneous architecture. On experiments and prototype implementation, the advantages of OmegaDB were verified through theoretical analysis and experimental results by comparing with Relational Database Management System (RDBMS), and the potential of OmegaDB in utilizing new hardware was shown. According to the theoretical study of query optimization frameworks of Multiple-Query-at-a-Time models based on the traditional relational algebraic rules, several optimization methods were proposed and future research directions were prospected. Using TPC-H business intelligence computing as a benchmarking program, the results show that OmegaDB achieves up to 24 times end-to-end speedup while consuming lower disk I/O and CPU time than the modern advanced commercial database system SQL SERVER.

Key words: highly concurrent relational database, Multiple-Query-at-a-Time, relational algebra, flow-batch computing, hardware acceleration

摘要:

数据库系统的不同查询之间存在访问数据路径重叠和计算共享的可能,而工作负载中的查询分批处理称为多条查询一次执行(Multiple-Query-at-a-Time)模型。一些已开发的多查询处理框架已经被证明有效,然而都缺乏构建完整查询处理和优化方法的普适框架。在基于等价变换来构建查询时算子合并优化框架的基础上,提出一种面向异构架构的关系型算子并发计算框架OmegaDB。该框架通过研究面向GPU的关系算子流批计算模型并构建关系数据查询流水,在CPU-GPU异构架构上实现了聚合多查询的流批计算方法。在实验及原型实现上,通过理论分析和实验结果验证OmegaDB相对传统关系型数据库管理系统(RDBMS)所具备的优势,以及OmegaDB利用新硬件的潜力。根据基于传统关系代数规则的多条查询一次执行模型的查询优化框架的理论研究,提出多个优化方法并展望未来研究方向。使用TPC-H商业智能计算作为基准测试程序,实验结果表明OmegaDB与现代先进的商业数据库系统SQL SERVER相比,在消耗更低的磁盘I/O和CPU时间的情况下,最高可以达到24倍的端到端加速。

关键词: 高并发关系数据库, 多条查询一次执行, 关系代数, 流批计算, 硬件加速

CLC Number: