Journal of Computer Applications ›› 2023, Vol. 43 ›› Issue (3): 759-766.DOI: 10.11772/j.issn.1001-9081.2022020211
Special Issue: 数据科学与技术
• Data science and technology • Previous Articles Next Articles
Li YANG, Jianting CHEN, Yang XIANG()
Received:
2022-02-24
Revised:
2022-05-31
Accepted:
2022-06-02
Online:
2022-08-16
Published:
2023-03-10
Contact:
Yang XIANG
About author:
YANG Li, born in 1998, M. S. candidate. His research interests include big data, distributed system.Supported by:
通讯作者:
向阳
作者简介:
杨力(1998—),男,甘肃张掖人,硕士研究生,主要研究方向:大数据、分布式系统基金资助:
CLC Number:
Li YANG, Jianting CHEN, Yang XIANG. Performance optimization strategy of distributed storage for industrial time series big data based on HBase[J]. Journal of Computer Applications, 2023, 43(3): 759-766.
杨力, 陈建廷, 向阳. 基于HBase的工业时序大数据分布式存储性能优化策略[J]. 《计算机应用》唯一官方网站, 2023, 43(3): 759-766.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2022020211
数据集 | 数据规模 | 数据大小/GB |
---|---|---|
TS1 | 4 551 131 | 0.53 |
TS2 | 9 102 282 | 1.05 |
TS3 | 27 307 256 | 3.15 |
TS4 | 45 511 729 | 5.25 |
TS5 | 63 716 420 | 7.35 |
TS6 | 91 023 458 | 10.49 |
TS7 | 136 535 187 | 15.75 |
TS8 | 182 046 916 | 20.44 |
TS9 | 240 000 000 | 26.95 |
Tab. 1 Time series datasets
数据集 | 数据规模 | 数据大小/GB |
---|---|---|
TS1 | 4 551 131 | 0.53 |
TS2 | 9 102 282 | 1.05 |
TS3 | 27 307 256 | 3.15 |
TS4 | 45 511 729 | 5.25 |
TS5 | 63 716 420 | 7.35 |
TS6 | 91 023 458 | 10.49 |
TS7 | 136 535 187 | 15.75 |
TS8 | 182 046 916 | 20.44 |
TS9 | 240 000 000 | 26.95 |
数据集 | 原系统 | PUB-HBase | HBalancer | 本文方法 |
---|---|---|---|---|
TS1 | 1 320 336 | 1 093 551 | 1 079 617 | 970 510 |
TS2 | 1 228 332 | 1 148 735 | 1 080 976 | 968 935 |
TS3 | 959 272 | 964 214 | 962 724 | 828 265 |
TS4 | 784 302 | 708 937 | 672 987 | 622 490 |
TS5 | 580 265 | 514 919 | 486 567 | 424 084 |
TS6 | 120 129 | 100 345 | 92 936 | 82 163 |
TS7 | 121 554 | 92 912 | 93 268 | 75 974 |
TS8 | 104 722 | 77 992 | 73 147 | 63 139 |
TS9 | 91 217 | 68 624 | 64 869 | 55 609 |
Tab. 2 Load tilts of different methods under different data volumes
数据集 | 原系统 | PUB-HBase | HBalancer | 本文方法 |
---|---|---|---|---|
TS1 | 1 320 336 | 1 093 551 | 1 079 617 | 970 510 |
TS2 | 1 228 332 | 1 148 735 | 1 080 976 | 968 935 |
TS3 | 959 272 | 964 214 | 962 724 | 828 265 |
TS4 | 784 302 | 708 937 | 672 987 | 622 490 |
TS5 | 580 265 | 514 919 | 486 567 | 424 084 |
TS6 | 120 129 | 100 345 | 92 936 | 82 163 |
TS7 | 121 554 | 92 912 | 93 268 | 75 974 |
TS8 | 104 722 | 77 992 | 73 147 | 63 139 |
TS9 | 91 217 | 68 624 | 64 869 | 55 609 |
训练集数据量/GB | 训练时间/s | 模型精度/% |
---|---|---|
0.54 | 369.19 | 76.36 |
0.90 | 612.80 | 79.83 |
1.26 | 856.43 | 82.16 |
1.80 | 1 219.90 | 83.38 |
2.70 | 1 829.10 | 84.56 |
4.50 | 3 048.49 | 85.42 |
5.40 | 3 658.19 | 85.57 |
7.20 | 4 877.59 | 85.62 |
Tab. 3 Train times and prediction accuracyies under different training volumes
训练集数据量/GB | 训练时间/s | 模型精度/% |
---|---|---|
0.54 | 369.19 | 76.36 |
0.90 | 612.80 | 79.83 |
1.26 | 856.43 | 82.16 |
1.80 | 1 219.90 | 83.38 |
2.70 | 1 829.10 | 84.56 |
4.50 | 3 048.49 | 85.42 |
5.40 | 3 658.19 | 85.57 |
7.20 | 4 877.59 | 85.62 |
数据集 | 原系统 | PUB-HBase | 本文方法 |
---|---|---|---|
TS1 | 212 | 194 | 180 |
TS2 | 408 | 360 | 339 |
TS3 | 739 | 646 | 602 |
TS4 | 1 322 | 1 038 | 954 |
TS5 | 6 536 | 5 062 | 4 685 |
TS6 | 8 549 | 7 082 | 6 114 |
TS7 | 12 059 | 11 012 | 8 184 |
TS8 | 21 715 | 17 944 | 13 948 |
TS9 | 26 234 | 19 345 | 14 166 |
Tab. 4 Comparision of comprehensive data query time
数据集 | 原系统 | PUB-HBase | 本文方法 |
---|---|---|---|
TS1 | 212 | 194 | 180 |
TS2 | 408 | 360 | 339 |
TS3 | 739 | 646 | 602 |
TS4 | 1 322 | 1 038 | 954 |
TS5 | 6 536 | 5 062 | 4 685 |
TS6 | 8 549 | 7 082 | 6 114 |
TS7 | 12 059 | 11 012 | 8 184 |
TS8 | 21 715 | 17 944 | 13 948 |
TS9 | 26 234 | 19 345 | 14 166 |
1 | 施巍松,孙辉,曹杰,等. 边缘计算:万物互联时代新型计算模型[J]. 计算机研究与发展, 2017, 54(5):907-924. 10.7544/issn1000-1239.2017.20160941 |
SHI W S, SUN H, CAO J, et al. Edge computing - an emerging computing model for the internet of everything era[J]. Journal of Computer Research and Development, 2017, 54(5): 907-924. 10.7544/issn1000-1239.2017.20160941 | |
2 | JEONG K J, PARK J D, HWANG K, et al. Two-stage deep anomaly detection with heterogeneous time series data[J]. IEEE Access, 2022, 10: 13704-13714. 10.1109/access.2022.3147188 |
3 | 刘博伟,黄瑞章. 基于HBase的金融时序数据存储系统[J]. 中国科技论文, 2016, 11(20):2387-2392. 10.3969/j.issn.2095-2783.2016.20.022 |
LIU B W, HUANG R Z. HBase-based storage system for financial time series data[J]. China Sciencepaper, 2016, 11(20):2387-2392. 10.3969/j.issn.2095-2783.2016.20.022 | |
4 | 李晓根. 基于Hadoop的工业大数据监测分析平台技术实现[D]. 北京:北方工业大学, 2019: 1-4. |
LI X G. Implementation of industrial big data monitoring and analysis platform technology based on Hadoop[D]. Beijing: North China University of Technology, 2019: 1-4. | |
5 | 刘磊. 基于Spark 平台的大数据聚类算法研究及其应用[D]. 南京:南京邮电大学, 2018: 1-2. |
LIU L. Research and application of big data clustering algorithm based on spark platform[D]. Nanjing: Nanjing University of Posts and Telecommunications, 2018: 1-2. | |
6 | VORA M N. Hadoop-HBase for large-scale data[C]// Proceedings of the 2011 International Conference on Computer Science and Network Technology, Volume 1. Piscataway: IEEE, 2011: 601-605. 10.1109/iccsnt.2011.6182030 |
7 | VAN LE H, TAKASU A. A scalable spatio-temporal data storage for intelligent transportation systems based on HBase[C]// Proceedings of the IEEE 18th International Conference on Intelligent Transportation Systems. Piscataway: IEEE, 2015: 2733-2738. 10.1109/itsc.2015.439 |
8 | 王远,陶烨,袁军,等. 一种基于HBase的智能电网时序大数据处理方法[J]. 系统仿真学报, 2016, 28(3): 559-568. |
WANG Y, TAO Y, YUAN J, et al. Approach to process smart grid time-serial big data based on HBase[J]. Journal of System Simulation, 2016, 28(3): 559-568. | |
9 | AZQUETA-ALZÚAZ A, PATIÑO-MARTINEZ M, BRONDINO I, et al. Massive data load on distributed database systems over HBase[C]// Proceedings of the 17th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing. Piscataway: IEEE, 2017: 776-779. 10.1109/ccgrid.2017.124 |
10 | 雷鸣,姜罕盛,武国良,等. 基于HBase的大数据架构下负载平衡技术[J]. 计算机与现代化, 2021(6):91-95. 10.3969/j.issn.1006-2475.2021.06.015 |
LEI M, JIANG H S, WU G L, et al. Load balancing technology under big data architecture based on HBase[J]. Computer and Modernization, 2021(6):91-95. 10.3969/j.issn.1006-2475.2021.06.015 | |
11 | 王璐. 基于HBase的大数据存储设计及高并发查询方法研究[J]. 信息与电脑, 2021, 33(15):184-187. 10.3969/j.issn.1003-9767.2021.15.057 |
WANG L. Research on big data storage design and high concurrent query method based on HBase[J]. China Computer & Communication, 2021, 33(15):184-187. 10.3969/j.issn.1003-9767.2021.15.057 | |
12 | 张周. HBase中面向多源异构时序数据的高效能存储策略研究[D].长沙:湖南大学, 2019: 1-51. |
ZHANG Z. Research on high-performance storage strategy for multi-source heterogeneous time series data in HBase[D]. Changsha: Hunan University, 2019: 1-51. | |
13 | SUN J L, ZHANG Y. Research on dynamic load balancing of data flow under big data platform[J]. International Journal of Modeling, Simulation, and Scientific Computing, 2021, 12(2): No.2150014. 10.1142/s1793962321500148 |
14 | CHEN Y B, XIANG X, LING X, et al. Dynamic load balance for hot-spot and unbalance region problems in HBase[C]// Proceedings of the 2020 IEEE International Conference on Big Data. Piscataway: IEEE, 2020: 2583-2589. 10.1109/bigdata50022.2020.9378465 |
15 | XIONG A P, ZOU J. Research of dynamic load balancing strategy on HBase[C]// Proceedings of the 5th International Conference on Information Engineering for Mechanics and Materials. Dordrecht: Atlantis Press, 2015: 1599-1604. 10.2991/icimm-15.2015.296 |
16 | CRUZ F, MAIA F, OLIVEIRA R, et al. Workload-aware table splitting for NoSQL[C]// Proceedings of the 29th Annual ACM Symposium on Applied Computing. New York: ACM, 2014: 399-404. 10.1145/2554850.2555027 |
17 | GHANDOUR A, MOUKALLED M, JABER M, et al. User-based load balancer in HBase[C]// Proceedings of the 7th International Conference on Cloud Computing and Services Science. Setúbal: SciTePress, 2017: 392-396. 10.5220/0006290103920396 |
18 | 祝烨. 分布式数据库系统热点负载均衡研究[D]. 武汉:华中科技大学, 2015: 1-48. |
ZHU Y. Research of balancing for hotpot in the distribute cluster[D]. Wuhan: Huazhong University of Science and Technology, 2015: 1-48. | |
19 | 王荣生,杨际祥,王凡. 负载均衡策略研究综述[J]. 小型微型计算机系统, 2010, 31(8):1681-1686. |
WANG R S, YANG J X, WANG F. Survey of load balancing strategies[J]. Journal of Chinese Computer Systems, 2010, 31(8):1681-1686. | |
20 | HUANG X H, WANG L Z, YAN J N, et al. Towards building a distributed data management architecture to integrate multi-sources remote sensing big data[C]// Proceedings of the IEEE 20th International Conference on High Performance Computing and Communications/IEEE 16th International Conference on Smart City/IEEE 4th International Conference on Data Science and Systems. Piscataway: IEEE, 2018:83-90. 10.1109/hpcc/smartcity/dss.2018.00043 |
21 | 王帅. HBase数据库评测关键技术的研究[D]. 哈尔滨:哈尔滨工业大学, 2015:35-50. |
WANG S. Research on key evaluating techniques of HBase database[D]. Harbin: Harbin Institute of Technology, 2015: 35-50. |
[1] | Ming ZHANG, Le FU, Haifeng WANG. Relay control model for concurrent data flow in edge computing [J]. Journal of Computer Applications, 2024, 44(12): 3876-3883. |
[2] | Yanan SUN, Jiehong WU, Junling SHI, Lijun GAO. Multi-UAV collaborative task assignment method based on improved self-organizing map [J]. Journal of Computer Applications, 2023, 43(5): 1551-1556. |
[3] | Yunbo LONG, Dan TANG. Load balancing method based on local repair code in distributed storage [J]. Journal of Computer Applications, 2023, 43(3): 767-775. |
[4] | Na ZHOU, Ming CHENG, Menglin JIA, Yang YANG. Medical image privacy protection based on thumbnail encryption and distributed storage [J]. Journal of Computer Applications, 2023, 43(10): 3149-3155. |
[5] | ZHAO Quan, TANG Xiaochun, ZHU Ziyu, MAO Anqi, LI Zhanhuai. Low-latency cluster scheduling framework for large-scale short-time tasks [J]. Journal of Computer Applications, 2021, 41(8): 2396-2405. |
[6] | QING Xinyi, CHEN Yuling, ZHOU Zhengqiang, TU Yuanchao, LI Tao. Blockchain storage expansion model based on Chinese remainder theorem [J]. Journal of Computer Applications, 2021, 41(7): 1977-1982. |
[7] | XU Hongliang, YANG Guiqin, JIANG Zhanjun. Data center adaptive multi-path load balancing algorithm based on software defined network [J]. Journal of Computer Applications, 2021, 41(4): 1160-1164. |
[8] | YANG Ling, JIANG Chunmao. Strategy of energy-aware virtual machine migration based on three-way decision [J]. Journal of Computer Applications, 2021, 41(4): 990-998. |
[9] | CUI Shuangshuang, WANG Hongzhi. Implementation method of lightweight distributed index based on log structured merge-tree [J]. Journal of Computer Applications, 2021, 41(3): 630-635. |
[10] | Jiangfeng XU, Yulong TAN. Optimization of multidimensional index query mechanism based on HBase [J]. Journal of Computer Applications, 2020, 40(2): 571-577. |
[11] | Cui LI, Qingkui CHEN. Dynamic monitoring model based on DPDK parallel communication [J]. Journal of Computer Applications, 2020, 40(2): 335-341. |
[12] | ZHANG Hang, LIU Shanzheng, TANG Dan, CAI Hongliang. Erasure code with low recovery-overhead in distributed storage systems [J]. Journal of Computer Applications, 2020, 40(10): 2942-2950. |
[13] | ZHANG Guochao, WANG Ruijin. Blockchain shard storage model based on threshold secret sharing [J]. Journal of Computer Applications, 2019, 39(9): 2617-2622. |
[14] | LI Zhuhong, ZHAO Canming, YAN Long, ZHANG Xinming. Load balancing opportunistic routing protocol for power line communication network in smart grids [J]. Journal of Computer Applications, 2019, 39(3): 812-816. |
[15] | FENG Jun, LI Dingsheng, LU Jiamin, ZHANG Lixia. Spatio-temporal index method for moving objects in road network based on HBase [J]. Journal of Computer Applications, 2018, 38(6): 1575-1583. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||