• •    

基于深度自回归模型的近似查询处理方法

岑黎彬1,李靖东1,林淳波2,王晓玲1   

  1. 1. 华东师范大学
    2. 华为技术有限公司
  • 收稿日期:2022-08-01 修回日期:2022-08-25 发布日期:2022-09-23
  • 通讯作者: 王晓玲
  • 基金资助:
    国家自然科学基金;上海市科委重点项目

Approximate query processing approach base on deep autoregressive model

  • Received:2022-08-01 Revised:2022-08-25 Online:2022-09-23
  • Supported by:
    National Natural Science Foundation of China;Key Project Science and Technology Commission of Shanghai Municipality

摘要: 摘 要: 聚合函数的近似查询处理(AQP)是近年来数据库领域的研究热点,针对现有的近似查询技术存在查询响应时间长、存储开销大、不支持多谓词查询等问题,提出一种基于深度自回归模型的近似查询处理方法(DeepAQP),使用深度自回归模型对表中多列数据的联合概率分布进行建模,并估计查询谓词限定下的选择度和目标列的概率分布,可以有效处理单表下的多谓词聚合函数近似查询请求。在TPC-H和TPC-DS数据集上进行实验:与基于采样的VerdictDB方法相比,查询响应时间减少了2-3个数量级,存储空间减少了3个数量级;与基于传统机器学习模型的DBEst++方法相比,查询响应速度亦减少了1个数量级,显著降低了模型训练耗时,并且可以处理前者所不支持的多谓词查询请求。实验结果表明,DeepAQP兼顾了查询精度和速度,并显著降低了算法在训练和存储上的开销。

关键词: 近似查询处理, 自回归模型, 数据库, 深度学习, 聚合函数

Abstract: Abstract: Approximate Query Processing (AQP) of aggregate functions was recently a research hotspot in the database field. Existing approximate query techniques have problems such as high query response time, high storage overhead, and no support for multi-predicate queries. Thus, a Deep autoregressive-based Approximate Query Processing method (DeepAQP) was proposed. DeepAQP learned the joint probability distribution of multi-column data in the table using a deep autoregressive model and estimated the target column's selectivity and distribution, which is limited by the query predicates. Thus, DeepAQP could effectively handle the approximate query requests of aggregation functions with multiple predicates in a single table. Experiments were conducted on the TPC-H and TPC-DS datasets. Compared with Verdict DB, which is a sample-based method, the query response time was reduced by 2 to 3 orders of magnitude, and the storage space was reduced by 3 orders of magnitude. Compared with DBEst++, which is a machine learning-based method, the query response speed was also reduced by 1 order of magnitude, and the model training time was reduced significantly. Besides, DeepAQP could handle multi-predicate query requests, which DBEst++ does not support. Experimental results show that DeepAQP achieves a better trade-off between accuracy and speed while reducing the training and storage overhead significantly.

Key words: approximate query processing, autoregressive model, database, deep learning, aggregate functions

中图分类号: