Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1848-1854.DOI: 10.11772/j.issn.1001-9081.2023060830
Special Issue: 数据科学与技术
• Data science and technology • Previous Articles Next Articles
Fanjun MENG1, Bin HAN1(), Shucheng HUANG1, Xiangdong MEI2
Received:
2023-06-28
Revised:
2023-08-26
Accepted:
2023-08-30
Online:
2023-09-11
Published:
2024-06-10
Contact:
Bin HAN
About author:
MENG Fanjun, born in 1994, M. S. candidate. His research interests include temporal big data.Supported by:
通讯作者:
韩斌
作者简介:
孟繁珺(1994—),男,河北沧州人,硕士研究生,主要研究方向:时态大数据基金资助:
CLC Number:
Fanjun MENG, Bin HAN, Shucheng HUANG, Xiangdong MEI. Distributed temporal index for temporal aggregation range query[J]. Journal of Computer Applications, 2024, 44(6): 1848-1854.
孟繁珺, 韩斌, 黄树成, 梅向东. 用于时态聚合范围查询的分布式时态索引[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1848-1854.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023060830
数据集 | 数据记录 大小/GB | 数据记录 条数/106 | 最大时间跨度 |
---|---|---|---|
TPC-BiH(SF=1) | 0.83 | 6 | 2.5×103 |
TPC-BiH(SF=10) | 8.47 | 60 | 2.5×103 |
YTTR | 3.89 | 39 | 31.6×109 |
Tab.1 Datasets used in experiment
数据集 | 数据记录 大小/GB | 数据记录 条数/106 | 最大时间跨度 |
---|---|---|---|
TPC-BiH(SF=1) | 0.83 | 6 | 2.5×103 |
TPC-BiH(SF=10) | 8.47 | 60 | 2.5×103 |
YTTR | 3.89 | 39 | 31.6×109 |
查询序号 | 查询时间段 长度/% | 查询时间段位置 | 时态聚合 函数及数量 |
---|---|---|---|
A1 | 1.0 | [0.25,0.75) | SUM×1 |
A2 | 1.0 | [0.25,0.75) | MAX×1 |
B1 | 0.1 | [0.00,0.25) | SUM×1 |
B2 | 0.1 | [0.75,1.00) | SUM×1 |
B3 | 1.0 | [0.00,0.25) | SUM×1 |
B4 | 1.0 | [0.75,1.00) | SUM×1 |
B5 | 100.0 | [0.00,1.00) | SUM×1 |
C1 | 0.1 | [0.25,0.75) | MAX×8 |
Tab.2 Temporal aggregation range queries
查询序号 | 查询时间段 长度/% | 查询时间段位置 | 时态聚合 函数及数量 |
---|---|---|---|
A1 | 1.0 | [0.25,0.75) | SUM×1 |
A2 | 1.0 | [0.25,0.75) | MAX×1 |
B1 | 0.1 | [0.00,0.25) | SUM×1 |
B2 | 0.1 | [0.75,1.00) | SUM×1 |
B3 | 1.0 | [0.00,0.25) | SUM×1 |
B4 | 1.0 | [0.75,1.00) | SUM×1 |
B5 | 100.0 | [0.00,1.00) | SUM×1 |
C1 | 0.1 | [0.25,0.75) | MAX×8 |
数据集名称 | 分区内构造平均用时/ms | 辅助结构平均大小/KB | 数据加载平均用时/s | |||||
---|---|---|---|---|---|---|---|---|
PCS | B Tree | Counting Sort | PCS | B Tree | Counting Sort | DTI | ParTime | |
TPC-BiH (SF=1) | 260.00 | 2 207.70 | 240.50 | 8 379.38 | 136 432.32 | 19.98 | 49.04 | 27.54 |
YTTR | 240.90 | 452.70 | NaN | 5 428.24 | 77 912.47 | N/A | 308.82 | 94.96 |
Tab. 3 Performance of intra-partition index construction and data loading
数据集名称 | 分区内构造平均用时/ms | 辅助结构平均大小/KB | 数据加载平均用时/s | |||||
---|---|---|---|---|---|---|---|---|
PCS | B Tree | Counting Sort | PCS | B Tree | Counting Sort | DTI | ParTime | |
TPC-BiH (SF=1) | 260.00 | 2 207.70 | 240.50 | 8 379.38 | 136 432.32 | 19.98 | 49.04 | 27.54 |
YTTR | 240.90 | 452.70 | NaN | 5 428.24 | 77 912.47 | N/A | 308.82 | 94.96 |
数据集 | 查询序号 | DTI执行时间/ms | ParTime执行时间/ms | ||||
---|---|---|---|---|---|---|---|
部分聚合 | 归并 | 总用时 | 生成DM | 归并 | 总用时 | ||
TPC-BiH (SF=1) | A1 | 461.67 | 23.50 | 22 058.67 | 1 076.33 | 30.00 | 22 616.00 |
A2 | 666.00 | 26.33 | 21 949.33 | 943.00 | 108.00 | 21 413.67 | |
B1 | 56.00 | 5.00 | 21 755.33 | 797.33 | 10.33 | 21 388.00 | |
B2 | 783.00 | 5.00 | 21 572.00 | 762.33 | 24.33 | 20 918.00 | |
B3 | 112.00 | 23.00 | 20 843.00 | 781.33 | 21.67 | 20 883.00 | |
B4 | 760.67 | 25.00 | 22 235.67 | 753.00 | 27.33 | 21 118.67 | |
B5 | 2 392.33 | 275.00 | 23 661.00 | 2 793.33 | 138.67 | 22 921.00 | |
C1 | 974.00 | 16.33 | 21 886.33 | 1 624.00 | 226.67 | 22 210.67 | |
TPC-BiH (SF=10) | A1 | 1 284.00 | 58.33 | 34 009.67 | 1 799.67 | 91.00 | 37 005.33 |
A2 | 1 411.33 | 59.00 | 34 790.00 | 2 015.67 | 1 210.67 | 38 914.33 | |
B1 | 366.00 | 20.33 | 31 684.67 | 1 419.33 | 28.00 | 33 826.67 | |
B2 | 1 373.33 | 21.67 | 34 103.67 | 1 594.67 | 69.00 | 34 663.67 | |
B3 | 176.00 | 53.00 | 31 216.33 | 1 511.67 | 45.33 | 34 235.67 | |
B4 | 1 495.67 | 52.67 | 34 452.33 | 1 582.33 | 86.67 | 33 870.00 | |
B5 | 7 080.33 | 1 331.67 | 48 504.00 | 8 191.00 | 611.00 | 52 531.33 | |
C1 | 1 666.67 | 38.33 | 36 184.33 | 4 040.00 | 4 097.00 | 49 294.67 | |
YTTR | A1 | 560.00 | 625.00 | 31 440.67 | 1 015.67 | 983.00 | 32 373.00 |
A2 | 1 054.33 | 719.33 | 29 639.33 | 927.33 | 1 174.33 | 31 132.33 | |
B1 | 259.67 | 103.00 | 29 247.33 | 400.00 | 167.67 | 30 738.00 | |
B2 | 215.67 | 90.00 | 28 859.67 | 319.33 | 160.33 | 29 336.00 | |
B3 | 416.67 | 394.33 | 28 937.00 | 700.00 | 762.67 | 30 966.67 | |
B4 | 430.33 | 365.33 | 29 272.67 | 516.67 | 778.67 | 29 740.67 | |
B5 | 14 921.00 | 65 289.33 | 162 593.00 | 21 724.33 | 87 942.00 | 233 179.67 | |
C1 | 768.00 | 522.67 | 30 788.00 | 1 394.67 | 592.00 | 31 746.67 |
Tab. 4 Execution time of temporal aggregation range query
数据集 | 查询序号 | DTI执行时间/ms | ParTime执行时间/ms | ||||
---|---|---|---|---|---|---|---|
部分聚合 | 归并 | 总用时 | 生成DM | 归并 | 总用时 | ||
TPC-BiH (SF=1) | A1 | 461.67 | 23.50 | 22 058.67 | 1 076.33 | 30.00 | 22 616.00 |
A2 | 666.00 | 26.33 | 21 949.33 | 943.00 | 108.00 | 21 413.67 | |
B1 | 56.00 | 5.00 | 21 755.33 | 797.33 | 10.33 | 21 388.00 | |
B2 | 783.00 | 5.00 | 21 572.00 | 762.33 | 24.33 | 20 918.00 | |
B3 | 112.00 | 23.00 | 20 843.00 | 781.33 | 21.67 | 20 883.00 | |
B4 | 760.67 | 25.00 | 22 235.67 | 753.00 | 27.33 | 21 118.67 | |
B5 | 2 392.33 | 275.00 | 23 661.00 | 2 793.33 | 138.67 | 22 921.00 | |
C1 | 974.00 | 16.33 | 21 886.33 | 1 624.00 | 226.67 | 22 210.67 | |
TPC-BiH (SF=10) | A1 | 1 284.00 | 58.33 | 34 009.67 | 1 799.67 | 91.00 | 37 005.33 |
A2 | 1 411.33 | 59.00 | 34 790.00 | 2 015.67 | 1 210.67 | 38 914.33 | |
B1 | 366.00 | 20.33 | 31 684.67 | 1 419.33 | 28.00 | 33 826.67 | |
B2 | 1 373.33 | 21.67 | 34 103.67 | 1 594.67 | 69.00 | 34 663.67 | |
B3 | 176.00 | 53.00 | 31 216.33 | 1 511.67 | 45.33 | 34 235.67 | |
B4 | 1 495.67 | 52.67 | 34 452.33 | 1 582.33 | 86.67 | 33 870.00 | |
B5 | 7 080.33 | 1 331.67 | 48 504.00 | 8 191.00 | 611.00 | 52 531.33 | |
C1 | 1 666.67 | 38.33 | 36 184.33 | 4 040.00 | 4 097.00 | 49 294.67 | |
YTTR | A1 | 560.00 | 625.00 | 31 440.67 | 1 015.67 | 983.00 | 32 373.00 |
A2 | 1 054.33 | 719.33 | 29 639.33 | 927.33 | 1 174.33 | 31 132.33 | |
B1 | 259.67 | 103.00 | 29 247.33 | 400.00 | 167.67 | 30 738.00 | |
B2 | 215.67 | 90.00 | 28 859.67 | 319.33 | 160.33 | 29 336.00 | |
B3 | 416.67 | 394.33 | 28 937.00 | 700.00 | 762.67 | 30 966.67 | |
B4 | 430.33 | 365.33 | 29 272.67 | 516.67 | 778.67 | 29 740.67 | |
B5 | 14 921.00 | 65 289.33 | 162 593.00 | 21 724.33 | 87 942.00 | 233 179.67 | |
C1 | 768.00 | 522.67 | 30 788.00 | 1 394.67 | 592.00 | 31 746.67 |
1 | HUANG Y, LI X, ZHANG G-Q. ELII: a novel inverted index for fast temporal query, with application to a large Covid-19 EHR dataset[J]. Journal of Biomedical Informatics, 2021, 117: 103744. |
2 | DONG X, WONG R, LYU W, et al. An integrated LSTM-HeteroRGNN model for interpretable opioid overdose risk prediction[J]. Artificial Intelligence in Medicine, 2023, 135: 102439. |
3 | HU X, SINTOS S, GAO J, et al. Computing complex temporal join queries efficiently[C]// Proceedings of the 2022 International Conference on Management of Data. New York: ACM, 2022: 2076-2090. |
4 | BOUROS P, MAMOULIS N, TSITSIGKOS D, et al. In-memory interval joins[J]. The VLDB Journal, 2021, 30(4): 667-691. |
5 | DIGNÖS A, BÖHLEN M H, GAMPER J, et al. Leveraging range joins for the computation of overlap joins[J]. The VLDB Journal, 2022, 31(1): 75-99. |
6 | LUO J-Z, SHI S-F, YANG G, et al. O2iJoin: an efficient index-based algorithm for overlap interval join [J]. Journal of Computer Science and Technology, 2018, 33(5): 1023-1038. |
7 | 张伟, 王志杰. 分布式环境下时态大数据的连接操作研究[J]. 计算机工程, 2019, 45(3): 20-25,31. |
ZHANG W, WANG Z J. Research on join operation of temporal big data in distributed environment [J]. Computer Engineering, 2019,45(3):20-25,31. | |
8 | WANG L, CAI R, FU T Z J, et al. Waterwheel: realtime indexing and temporal range query processing over massive data streams[C]// Proceedings of the 2018 IEEE 34th International Conference on Data Engineering. Piscataway: IEEE, 2018: 269-280. |
9 | BEHREND A, DIGNÖS A, GAMPER J, et al. Period index: a learned 2D hash index for range and duration queries[C]// Proceedings of the 16th International Symposium on Spatial and Temporal Databases. New York: ACM, 2019: 100-109. |
10 | CECCARELLO M, DIGNÖS A, GAMPER J, et al. Indexing temporal relations for range-duration queries [EB/OL]. (2022-06-15) [2023-04-03]. . |
11 | 杨佐希, 汤娜, 汤庸, 等. 基于时序分区的时态索引与查询[J]. 软件学报, 2020, 31(11): 3519-3539. |
YANG Z X, TANG N, TANG Y, et al. Temporal index and query based on timing partition [J]. Journal of Software, 2020,31(11):3519-3539. | |
12 | SUZANNE A, RASCHIA G, MARTINEZ J, et al. Slicing techniques for temporal aggregation in spanning event streams [J]. Information and Computation, 2021, 281: 104807. |
13 | GRANDI F, MANDREOLI F, MARTOGLIA R, et al. Unleashing the power of querying streaming data in a temporal database world: a relational algebra approach[J]. Information Systems, 2022, 103: 101872. |
14 | SUZANNE A, RASCHIA G, MARTINEZ J, et al. Window-slicing techniques extended to spanning-event streams[C]// Proceedings of the 27th International Symposium on Temporal Representation and Reasoning. Dagstuhl, Germany: Dagstuhl Publishing, 2020: 10:1-10:14. |
15 | GAO Q, LEE M L, LING T W. Temporal keyword search with aggregates and group-by[C]// Proceedings of the 40th International Conference on Conceptual Modeling. Cham: Springer, 2021: 160-175. |
16 | YAO B, ZHANG W, WANG Z-J, et al. Distributed in-memory analytics for big temporal data[C]// Proceedings of the 23rd International Conference on Database Systems for Advanced Applications. Cham: Springer, 2018: 549-565. |
17 | KLINE N, SNODGRASS R T. Computing temporal aggregates[C]// Proceedings of the 11th International Conference on Data Engineering. Washington, DC: IEEE Computer Society, 1995: 222-231. |
18 | MAHLKNECHT G, DIGNÖS A, KOZMINA N. Modeling and querying facts with period timestamps in data warehouses[J]. International Journal of Applied Mathematics and Computer Science, 2019, 29(1): 31-49. |
19 | KAUFMANN M, MANJILI A A, VAGENAS P, et al. Timeline index: a unified data structure for processing queries on temporal data in SAP HANA[C]// Proceedings of the 2013 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2013: 1173-1184. |
20 | PILMAN M, KAUFMANN M, KÖHL F, et al. ParTime: parallel temporal aggregation[C]// Proceedings of the 2016 International Conference on Management of Data. New York: ACM, 2016: 999-1010. |
21 | LU W, ZHAO Z, WANG X, et al. A lightweight and efficient temporal database management system in TDSQL[J]. Proceedings of the VLDB Endowment, 2019, 12(12): 2035-2046. |
22 | QIAO J, HUANG X, WANG J, et al. Dual-PISA: an index for aggregation operations on time series data[J]. Information Systems, 2020, 87: 101427. |
23 | ZHENG X, LIU H-K, WEI L-N, et al. Timo: in-memory temporal query processing for big temporal data [C]// Proceedings of the 2019 Seventh International Conference on Advanced Cloud and Big Data . Piscataway: IEEE, 2019: 121-126. |
24 | CHRISTODOULOU G, BOUROS P, MAMOULIS N. HINT: a hierarchical index for intervals in main memory[C]// Proceedings of the 2022 International Conference on Management of Data. New York: ACM, 2022: 1257-1270. |
25 | 陈瑛, 吴明珠, 卢莉, 等. 时态拟序数据索引TQD-tree更新技术[J]. 华南师范大学学报(自然科学版), 2019, 51(2): 123-127. |
CHEN Y, WU M Z, LU L, et al. Updating technique of temporal quasi-order data index[J]. Journal of South China Normal University (Natural Science Edition), 2019,51(2):123-127. | |
26 | KAUFMANN M, FISCHER P M, MAY N, et al. TPC-BiH: a benchmark for bitemporal databases[C]// Performance Characterization and Benchmarking. TPCTC 2013. Cham: Springer, 2013: 16-31. |
[1] | Zhizheng ZHANG, Xiaojian ZHANG, Junqing WANG, Guanghui FENG. Federated spatial data publication method with differential privacy and secure aggregation [J]. Journal of Computer Applications, 2024, 44(9): 2777-2784. |
[2] | Chuanlin PANG, Rui TANG, Ruizhi ZHANG, Chuan LIU, Jia LIU, Shibo YUE. Distributed power allocation algorithm based on graph convolutional network for D2D communication systems [J]. Journal of Computer Applications, 2024, 44(9): 2855-2862. |
[3] | Huanhuan LI, Tianqiang HUANG, Xuemei DING, Haifeng LUO, Liqing HUANG. Public traffic demand prediction based on multi-scale spatial-temporal graph convolutional network [J]. Journal of Computer Applications, 2024, 44(7): 2065-2072. |
[4] | Xu LI, Yulin HE, Laizhong CUI, Zhexue HUANG, Fournier‑Viger PHILIPPE. Distributed observation point classifier for big data with random sample partition [J]. Journal of Computer Applications, 2024, 44(6): 1727-1733. |
[5] | Fengfeng WEI, Weineng CHEN. Distributed data-driven evolutionary computation for multi-constrained optimization [J]. Journal of Computer Applications, 2024, 44(5): 1393-1400. |
[6] | Ruixuan NI, Miao CAI, Baoliu YE. DFS-Cache: memory-efficient and persistent client cache for distributed file systems [J]. Journal of Computer Applications, 2024, 44(4): 1172-1180. |
[7] | Siyi ZHOU, Tianrui LI. Visual analysis of multivariate spatio-temporal data for origin-destination flow [J]. Journal of Computer Applications, 2024, 44(2): 452-459. |
[8] | Zucuan ZHANG, Xuebin CHEN, Rui GAO, Yuanhuai ZOU. Federated learning client selection method based on label classification [J]. Journal of Computer Applications, 2024, 44(12): 3759-3765. |
[9] | Li ZENG, Jingru YANG, Gang HUANG, Xiang JING, Chaoran LUO. Survey on hypergraph application methods: issues, advances, and challenges [J]. Journal of Computer Applications, 2024, 44(11): 3315-3326. |
[10] | Yu WANG, Zhihui GUAN, Yuanpeng LI. Distributed UAV cluster pursuit decision-making based on trajectory prediction and MADDPG [J]. Journal of Computer Applications, 2024, 44(11): 3623-3628. |
[11] | Guoxiang CHEN, Ziqiang YU, Haoyu ZHAO. Distributed k-nearest neighbor query algorithm for moving objects in dynamic road network [J]. Journal of Computer Applications, 2024, 44(11): 3403-3410. |
[12] | Chunyong YIN, Yongcheng ZHOU. Automatically adjusted clustered federated learning for double-ended clustering [J]. Journal of Computer Applications, 2024, 44(10): 3011-3020. |
[13] | Chenyang GE, Qinrang LIU, Xue PEI, Shuai WEI, Zhengbin ZHU. Efficient collaborative defense scheme against distributed denial of service attacks in software defined network [J]. Journal of Computer Applications, 2023, 43(8): 2477-2485. |
[14] | Chunze CAO, Delong MA, Ye YUAN. GTC: geo-distributed triangle counting algorithm in graph streams [J]. Journal of Computer Applications, 2023, 43(7): 2040-2048. |
[15] | Mengjie LAN, Jianping CAI, Lan SUN. Self-regularization optimization methods for Non-IID data in federated learning [J]. Journal of Computer Applications, 2023, 43(7): 2073-2081. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||