Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1848-1854.DOI: 10.11772/j.issn.1001-9081.2023060830

Special Issue: 数据科学与技术

• Data science and technology • Previous Articles     Next Articles

Distributed temporal index for temporal aggregation range query

Fanjun MENG1, Bin HAN1(), Shucheng HUANG1, Xiangdong MEI2   

  1. 1.School of Computer,Jiangsu University of Science and Technology,Zhenjiang Jiangsu 212100,China
    2.Xsuperzone Technology Company Limited,Changzhou Jiangsu 213022,China
  • Received:2023-06-28 Revised:2023-08-26 Accepted:2023-08-30 Online:2023-09-11 Published:2024-06-10
  • Contact: Bin HAN
  • About author:MENG Fanjun, born in 1994, M. S. candidate. His research interests include temporal big data.
    HUANG Shucheng, born in 1969, Ph. D., professor. His research interests include target tracking, multimedia data analysis.
    MEI Xiangdong, born in 1968, M. S., senior engineer. His research interests include graphics and image processing.
  • Supported by:
    Innovative Research Open Funds on Overall Ship Performance of Taihu Laboratory of Deepsea Technological Science(25422217)

用于时态聚合范围查询的分布式时态索引

孟繁珺1, 韩斌1(), 黄树成1, 梅向东2   

  1. 1.江苏科技大学 计算机学院,江苏 镇江 212100
    2.江苏赞奇科技股份有限公司,江苏 常州 213022
  • 通讯作者: 韩斌
  • 作者简介:孟繁珺(1994—),男,河北沧州人,硕士研究生,主要研究方向:时态大数据
    黄树成(1969—),男,江苏连云港人,教授,博士,主要研究方向:目标追踪、多媒体数据分析
    梅向东(1968—),男,湖北黄冈人,高级工程师,硕士,主要研究方向:图形图像处理。
  • 基金资助:
    深海技术科学太湖实验室船舶总体性能创新研究开放基金资助项目(25422217)

Abstract:

In the era of big data and cloud computing, querying and analyzing temporal big data faces many important challenges. Focused on the issues such as poor query performance and ineffective utilization of indexes for temporal aggregation range query, a Distributed Temporal Index (DTI) for temporal aggregation range query was proposed. Firstly, random or round-robin strategy was used to partition the temporal data. Secondly, intra-partition index construction algorithm based on timestamp’s bit array prefix was used to build intra-partition index, and partition statistics including time span were recorded. Thirdly, the data partitions whose time span overlapped with the query time interval were selected by predicate pushdown operation, and were pre-aggregated by index scan. Finally, all pre-aggregated values obtained from each partition were merged and aggregated by time. The experimental results show that the execution time of intra-partition index construction algorithm of the index for processing data with density of 2 400 entries per unit of time is similar to the execution time for processing data with density of 0.001 entries per unit of time. Compared to ParTime, the temporal aggregation range query algorithm with index takes at least 22% less time for each step when querying the data in the first 75% of timeline and at least 11% less time for each step when executing selective aggregation. Therefore, the algorithm with index is faster in most temporal aggregate range query tasks and its intra-partition index construction algorithm is capable to solve data sparsity problem with high efficiency.

Key words: temporal index, temporal data, distributed, temporal aggregation, counting sort

摘要:

在大数据与云计算时代,时态大数据的查询分析面临许多重要挑战。针对其中时态聚合范围查询性能不佳和不能有效利用索引等问题,提出一种用于时态聚合范围查询的分布式时态索引(DTI)。首先,采用随机或轮询策略对时态数据分区;其次,采用基于时间位数组前缀的分区内索引构造算法建立索引,同时记录包括时间跨度在内的分区统计信息;再次,利用谓词下推筛选时间跨度与查询时间区间重叠的数据分区,扫描索引进行预聚合;最后,将各分区得到的预聚合值按时间归并并聚合。实验结果表明,索引的分区内构造算法处理时间密度2 400条每单位时间和0.001条每单位时间的数据的执行时间相近。索引的聚合查询算法相较于ParTime算法:在查询时间线前75%的数据时,每一步用时都至少减少22%;执行选择型聚合函数时,每一步用时都至少减少11%。因此,索引在多数时态聚合范围查询任务中具有更高的速度,它的分区内构造算法能解决数据稀疏问题且执行效率高。

关键词: 时态索引, 时态数据, 分布式, 时态聚合, 计数排序

CLC Number: