《计算机应用》唯一官方网站

• •    下一篇

用于时态聚合范围查询的分布式时态索引

孟繁珺1,韩斌1,黄树成1,梅向东2   

  1. 1. 江苏科技大学
    2. 江苏赞奇科技有限公司
  • 收稿日期:2023-06-27 修回日期:2023-08-26 发布日期:2023-09-11 出版日期:2023-09-11
  • 通讯作者: 孟繁珺
  • 基金资助:
    国家自然科学基金

Distributed temporal index for temporal aggregation range query

  • Received:2023-06-27 Revised:2023-08-26 Online:2023-09-11 Published:2023-09-11
  • Supported by:
    the National Natural Science Foundation of China

摘要: 摘 要: 在大数据与云计算时代,时态大数据的查询分析面临许多重要挑战。针对其中时态聚合范围查询的查询性能不佳和不能有效利用索引等问题,提出一种用于时态聚合范围查询的分布式时态索引。首先,采用随机或轮询策略对时态数据分区;其次,采用基于时间位数组前缀的分区内索引构造算法建立索引,同时记录包括时间跨度在内的分区统计信息;再次,利用谓词下推筛选出时间跨度与查询时间区间重叠的数据分区,扫描时间线进行预聚合;最后,将各分区得到的聚合值按时间归并并聚合。实验结果表明,索引的分区内构造算法处理时间密度2400条每单位时间和0.001条每单位时间的数据的执行时间相近。索引的聚合查询算法相比于ParTime算法,在查询时间线前75%的数据时,每一步用时都至少减少22%,执行选择型聚合函数,每一步用时都至少减少11%。因此,索引在多数时态聚合范围查询任务中具有更快的速度,其分区内构造算法能解决数据稀疏且执行效率高。

关键词: 时态索引, 时态数据, 分布式, 时态聚合, 计数排序

Abstract: Abstract: In the era of big data and cloud computing, querying and analyzing temporal big data faces many important challenges. Focused on the issues such as poor query performance and ineffective utilization of indexes for temporal aggregation range query among them, a distributed temporal index for temporal aggregation range query was proposed. Firstly, random or Round-Robin strategy was used to partition the temporal data. Secondly, intra-partition index construction algorithm based on timestamp’s bit array prefix was used to build intra-partition index, and partition statistics including time span were recorded. Thirdly, data partitions whose time span overlaps with the query time interval were selected by predicate pushdown operation, and were pre-aggregated by index scan. Finally, all pre-aggregated values obtained from each partition were merged and aggregated by time. The simulation results show that the execution time of intra-partition index construction algorithm of the index is similar for processing data with densities of 2400 entries per unit of time and 0.001 entries per unit of time. Compared to ParTime, temporal aggregation range query algorithm of the index takes at least 22% less time for each step when querying the data in the first 75% of timeline and at least 11% less time for each step when executing selective aggregation. Therefore, the index is fast in most temporal aggregate range query tasks and its intra-partition index construction algorithm is capable to solve data sparsity and performs efficiently.

Key words: temporal index, temporal data, distributed, temporal aggregation, counting sort

中图分类号: