基于深度自回归模型的近似查询处理方法

• •

基于深度自回归模型的近似查询处理方法

岑黎彬¹,李靖东¹,林淳波²,王晓玲¹

1. 华东师范大学
2. 华为技术有限公司

收稿日期:2022-08-01 修回日期:2022-08-25 发布日期:2022-09-23
通讯作者: 王晓玲
基金资助:
国家自然科学基金;上海市科委重点项目

Approximate query processing approach base on deep autoregressive model

Received:2022-08-01 Revised:2022-08-25 Online:2022-09-23
Supported by:
National Natural Science Foundation of China;Key Project Science and Technology Commission of Shanghai Municipality

摘要/Abstract

摘要： 摘要: 聚合函数的近似查询处理(AQP)是近年来数据库领域的研究热点，针对现有的近似查询技术存在查询响应时间长、存储开销大、不支持多谓词查询等问题，提出一种基于深度自回归模型的近似查询处理方法(DeepAQP)，使用深度自回归模型对表中多列数据的联合概率分布进行建模，并估计查询谓词限定下的选择度和目标列的概率分布，可以有效处理单表下的多谓词聚合函数近似查询请求。在TPC-H和TPC-DS数据集上进行实验：与基于采样的VerdictDB方法相比，查询响应时间减少了2-3个数量级，存储空间减少了3个数量级；与基于传统机器学习模型的DBEst++方法相比，查询响应速度亦减少了1个数量级，显著降低了模型训练耗时，并且可以处理前者所不支持的多谓词查询请求。实验结果表明，DeepAQP兼顾了查询精度和速度，并显著降低了算法在训练和存储上的开销。

关键词: 近似查询处理, 自回归模型, 数据库, 深度学习, 聚合函数

Abstract: Abstract: Approximate Query Processing (AQP) of aggregate functions was recently a research hotspot in the database field. Existing approximate query techniques have problems such as high query response time, high storage overhead, and no support for multi-predicate queries. Thus, a Deep autoregressive-based Approximate Query Processing method (DeepAQP) was proposed. DeepAQP learned the joint probability distribution of multi-column data in the table using a deep autoregressive model and estimated the target column's selectivity and distribution, which is limited by the query predicates. Thus, DeepAQP could effectively handle the approximate query requests of aggregation functions with multiple predicates in a single table. Experiments were conducted on the TPC-H and TPC-DS datasets. Compared with Verdict DB, which is a sample-based method, the query response time was reduced by 2 to 3 orders of magnitude, and the storage space was reduced by 3 orders of magnitude. Compared with DBEst++, which is a machine learning-based method, the query response speed was also reduced by 1 order of magnitude, and the model training time was reduced significantly. Besides, DeepAQP could handle multi-predicate query requests, which DBEst++ does not support. Experimental results show that DeepAQP achieves a better trade-off between accuracy and speed while reducing the training and storage overhead significantly.

Key words: approximate query processing, autoregressive model, database, deep learning, aggregate functions

中图分类号:

TP391

岑黎彬李靖东林淳波王晓玲. 基于深度自回归模型的近似查询处理方法[J]. 计算机应用.

[1]	张鹏飞, 韩李涛, 冯恒健, 李洪梅. 基于注意力机制和全局特征优化的点云语义分割[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1086-1092.
[2]	王铂越, 李英祥, 钟剑丹. 基于改进Res-UNet的昼夜地基云图分割网络[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1310-1316.
[3]	万泽轩, 谢春丽, 吕泉润, 梁瑶. 基于依赖增强的分层抽象语法树的代码克隆检测[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1259-1268.
[4]	唐睿, 岳士博, 张睿智, 刘川, 庞川林. UAV协助下非正交多址接入使能的数据采集系统中能效优化机制[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1209-1218.
[5]	孙祥杰, 魏强, 王奕森, 杜江. 代码相似性检测技术综述[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1248-1258.
[6]	杨先凤, 汤依磊, 李自强. 基于交替注意力机制和图卷积网络的方面级情感分析模型[J]. 《计算机应用》唯一官方网站, 2024, 44(4): 1058-1064.
[7]	李雨秋, 侯利萍, 薛健, 吕科, 王泳. 基于内容解译的遥感图像推荐方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 722-731.
[8]	徐大鹏, 侯新民. 基于网络结构设计的图神经网络特征选择方法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 663-670.
[9]	董炜娜, 刘佳, 潘晓中, 陈立峰, 孙文权. 基于编码-解码网络的大容量鲁棒图像隐写方案[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 772-779.
[10]	赵奎, 仇慧琪, 李旭, 徐知非. 结合注意力和多路径融合的实时肺结节检测算法[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 945-952.
[11]	唐瑶瑶, 朱叶晨, 刘仰川, 高欣. CT图像环形伪影去除方法研究现状及展望[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 890-900.
[12]	蔡美玉, 朱润哲, 吴飞, 张开昱, 李家乐. 基于注意力机制和多粒度特征融合的跨视角匹配模型[J]. 《计算机应用》唯一官方网站, 2024, 44(3): 901-908.
[13]	宋钰丹, 王晶, 王雪徽, 马朝阳, 林友芳. 基于自适应多任务学习的睡眠生理时序分类方法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 654-662.
[14]	刘祥, 华蓓, 林飞, 魏宏原. 面向深度学习应用的组件式开发框架的设计实现[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 526-535.
[15]	荆智文, 张屿佳, 孙伯廷, 郭浩. 二阶段孪生图卷积神经网络推荐算法[J]. 《计算机应用》唯一官方网站, 2024, 44(2): 469-476.