Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (7): 2034-2039.DOI: 10.11772/j.issn.1001-9081.2022071128

• The 39th CCF National Database Conference (NDBC 2022) • Previous Articles    

Approximate query processing approach based on deep autoregressive model

Libin CEN1, Jingdong LI1, Chunbo LIN2, Xiaoling WANG1()   

  1. 1.School of Computer Science and Technology,East China Normal University,Shanghai 200062,China
    2.Gauss Database Labs,Huawei Technologies Company Limited,Shanghai 201206,China
  • Received:2022-07-12 Revised:2022-08-25 Accepted:2022-08-29 Online:2023-07-20 Published:2023-07-10
  • Contact: Xiaoling WANG
  • About author:CEN Libin, born in 1999, M. S. candidate. His research interests include query optimization, approximate query processing.
    LI Jingdong, born in 1996, Ph. D. candidate. His research interests include graph data mining, database theory.
    LIN Chunbo, born in 1995, M. S. His research interests include database theory, approximate query processing.
    WANG Xiaoling, born in 1975, Ph. D., professor. Her research interests include distributed graph data processing, knowledge graph, sequence recommendation and sequence data analysis.
  • Supported by:
    Key Project of National Natural Science Foundation of China(62136002);Key Project of Science and Technology Commission of Shanghai Municipality(20DZ1100300)

基于深度自回归模型的近似查询处理方法

岑黎彬1, 李靖东1, 林淳波2, 王晓玲1()   

  1. 1.华东师范大学 计算机科学与技术学院,上海 200062
    2.华为技术有限公司 高斯实验室,上海 201206
  • 通讯作者: 王晓玲
  • 作者简介:岑黎彬(1999—),男,广西梧州人,硕士研究生,CCF学生会员,主要研究方向:查询优化、近似查询处理;
    李靖东(1996—),男,江西景德镇人,博士研究生,CCF学生会员,主要研究方向:图数据挖掘、数据库理论;
    林淳波(1995—),男,浙江绍兴人,硕士,CCF会员,主要研究方向:数据库理论、近似查询处理;
    王晓玲(1975—),女,山东烟台人,教授,博士,CCF会员,主要研究方向:分布式图数据处理、知识图谱、序列推荐与序列数据分析。
  • 基金资助:
    国家自然科学基金重点项目(62136002);上海市科委重点项目(20DZ1100300)

Abstract:

Recently, Approximate Query Processing (AQP) of aggregate functions is a research hotspot in the database field. Existing approximate query techniques have problems such as high query response time cost, high storage overhead, and no support for multi-predicate queries. Thus, a deep autoregressive model-based AQP approach DeepAQP (Deep Approximate Query Processing) was proposed. DeepAQP leveraged deep autoregressive model to learn the joint probability distribution of multi-column data in the table in order to estimate the selectivity and the target column’s probability distribution of the given query, enhancing the ability to handle the approximate query requests of aggregation functions with multiple predicates in a single table. Experiments were conducted on TPC-H and TPC-DS datasets. The results show that compared with VerdictDB, which is a sample-based method, DeepAQP has the query response time reduced by 2 to 3 orders of magnitude, and the storage space reduced by 3 orders of magnitude; compared with DBEst++, which is a machine learning-based method, DeepAQP has the query response time reduced by 1 order of magnitude and the model training time reduced significantly. Besides, DeepAQP can handle with multi-predicate query requests, for which DBEst++ does not support. It can be seen that DeepAQP achieves good accuracy and speed at the same time and reduces the training and storage overhead of algorithm significantly.

Key words: Approximate Query Processing (AQP), autoregressive model, multi-predicate query, deep learning, aggregate function

摘要:

聚合函数的近似查询处理(AQP)是近年来数据库领域的研究热点。针对现有的近似查询技术存在查询响应时间长、存储开销大、不支持多谓词查询等问题,提出一种基于深度自回归模型的AQP方法DeepAQP (Deep Approximate Query Processing),利用深度自回归模型对表中多列数据的联合概率分布进行学习和建模,以估计给定查询的谓词选择度和目标列概率分布,以促进单表下多谓词聚合函数近似查询请求的有效处理。在TPC-H和TPC-DS数据集上进行实验,结果表明,与基于采样的VerdictDB方法相比,DeepAQP在查询响应时间和存储空间开销上均降低了2到3个数量级;与基于传统机器学习模型的DBEst++方法相比,DeepAQP的查询响应时间降低了1个数量级,显著降低了模型训练耗时,并且可以处理DBEst++所不支持的多谓词查询请求。可见,DeepAQP兼顾了查询精度和速度,并显著降低了算法在训练和存储上的开销。

关键词: 近似查询处理, 自回归模型, 多谓词查询, 深度学习, 聚合函数

CLC Number: