计算密集型大流量数据的接力计算与动态分流处理

doi:10.11772/j.issn.1001-9081.2020111725

计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2646-2651.DOI: 10.11772/j.issn.1001-9081.2020111725

所属专题：先进计算

计算密集型大流量数据的接力计算与动态分流处理

廖佳¹, 陈扬¹, 包秋兰¹, 廖雪花², 朱洲森¹

1. 四川师范大学物理与电子工程学院, 成都 610101;
2. 四川师范大学计算机科学学院, 成都 610101

收稿日期:2020-11-05 修回日期:2021-02-08 发布日期:2021-05-08 出版日期:2021-09-10
通讯作者: 朱洲森
作者简介:廖佳(1996-),女,四川雅安人,硕士研究生,主要研究方向:数据计算与分析;陈扬(1997-),女,四川达州人,硕士研究生,主要研究方向:大数据;包秋兰(1996-),女,四川遂宁人,硕士研究生,主要研究方向:数据计算与分析;廖雪花(1976-),女,四川德阳人,副教授,硕士,CCF会员,主要研究方向:大数据存储、内存技术、数据计算与分析;朱洲森(1966-),男,陕西西安人,教授,硕士,主要研究方向:数据计算与分析、大数据、系统框架。
基金资助:
国家社会科学基金资助项目（20BMZ092）；教育部产学合作协同育人项目（201802002036， 201901075008）。

Relay computation and dynamic diversion of computing-intensive large flow data

LIAO Jia¹, CHEN Yang¹, BAO Qiulan¹, LIAO Xuehua², ZHU Zhousen¹

1. College of Physics and Electronic Engineering, Sichuan Normal University, Chengdu Sichuan 610101, China;
2. College of Computer Science, Sichuan Normal University, Chengdu Sichuan 610101, China

Received:2020-11-05 Revised:2021-02-08 Online:2021-05-08 Published:2021-09-10
Supported by:
This work is partially supported by the National Social Science Foundation of China (20BMZ092), the Industry and Study Cooperative Education Project of the Ministry of Education (201802002036, 201901075008).

摘要/Abstract

摘要： 针对当前大流量数据计算速度慢、服务器端计算压力大等问题，提出一套计算密集型大流量数据的接力计算与动态分流处理模型。首先，在分布式环境下，使用内存型数据存储技术确定计算任务的运算量与复杂等级，同时利用节点资源能力对节点进行排序；然后，动态分配任务到不同节点进行并行计算，并采用一种接力处理模式完成计算任务的分解，以有效保证高流量复杂运算任务的性能和精度要求。通过分析对比，可知在万级以上数据量的情况下，多个节点比单个节点的运行时间更短、计算速度更快；而且，将该模型应用于实际时，发现它不仅能在高并发场景下减少运行时间，而且也能节省更多计算资源。

关键词: 数据分流, 接力计算, 计算节点, 数据同步, 内存型数据存储

Abstract: In view of the problems such as the slow computation of large flow data, the high computation pressure on the server, a set of relay computation and dynamic diversion model of computing-intensive large flow data was proposed. Firstly, in the distributed environment, the in-memory data storage technology was used to determine the computation amounts and complexity levels of the computation tasks. At the same time, the nodes were sorted by the node resource capacity, and the tasks were dynamically allocated to different nodes for parallel computing. Meanwhile, the computation tasks were decomposed by a relay processing mode, so as to guarantee the performance and accuracy requirements of high flow complex computing tasks. Through analysis and comparison, it can be seen that the running time of multiple nodes is shorter than that of the single node, and the computation speed of multiple nodes is faster than that of the single node when dealing with data volume of more than 10 000 levels. At the same time, when the model is applied in practice, it can be seen that the model can not only reduce the running time in high concurrency scenarios but also save more computing resources.

Key words: data diversion, relay computation, computation node, data synchronization, in-memory data storage

中图分类号:

TP391

廖佳, 陈扬, 包秋兰, 廖雪花, 朱洲森. 计算密集型大流量数据的接力计算与动态分流处理[J]. 计算机应用, 2021, 41(9): 2646-2651.

LIAO Jia, CHEN Yang, BAO Qiulan, LIAO Xuehua, ZHU Zhousen. Relay computation and dynamic diversion of computing-intensive large flow data[J]. Journal of Computer Applications, 2021, 41(9): 2646-2651.

参考文献

[1] CHAIKEN R,JENKINS B,LARSON P Å,et al. SCOPE:easy and efficient parallel processing of massive data sets[J]. Proceedings of the VLDB Endowment,2008,1(2):1265-1276.
[2] GATES A F,NATKOVICH O,CHOPRA S,et al. Building a highlevel dataflow system on top of MapReduce:the Pig experience[J]. Proceedings of the VLDB Endowment,2009,2(2):1414-1425.
[3] 李俭, 郭川军, 姜微. 计算密集型海量数据查询处理关键技术分析[J]. 交通科技与经济,2015,17(4):110-113.(LI J,GUO C J,JIANG W. Query processing key technology of massive dataintensive computing[J]. Technology and Economy in Areas of Communications,2015,17(4):110-113.)
[4] 颜秉辉, 安虹, 梁伟浩, 等. 面向I/O密集型应用的分离执行模型的实现与优化[J]. 小型微型计算机系统,2019,40(12):2619-2623.(YAN B H,AN H,LIANG W H,et al. Implement and optimization of the decoupled execution for I/O intensive applications[J]. Journal of Chinese Computer Systems,2019,40(12):2619-2623.)
[5] 张可佳, 胡亚楠, 李春生, 等. 泛集群环境中计算密集型任务流调度策略[J]. 控制与决策,2019,34(12):2537-2546.(ZHANG K J,HU Y N,LI C S,et al. Scheduling strategy of compute-intensive task-flow in generalized cluster[J]. Control and Decision,2019,34(12):2537-2546.)
[6] 张楠, 李宗骍. 面向计算密集型任务的分布式任务调度平台设计与实现[J]. 电力系统装备,2018(10):216-217.(ZHANG N,LI Z X. Design and implementation of distributed task scheduling platform for computationally intensive tasks[J]. Electric Power System Equipment,2018(10):216-217.)
[7] 杨志豪, 赵太银, 姚兴苗, 等. 一种适应数据与计算密集型任务的私有云系统实现研究[J]. 计算机应用研究,2011,28(2):621-624.(YANG Z H,ZHAO T Y,YAO X M,et al. Private cloud computing system realization method adaptable to data and computing intensive tasks[J]. Application Research of Computers, 2011,28(2):621-624.)
[8] 郝永生, 卢俊文, 刘冠峰, 等. 计算密集型与数据密集型混合网格作业调度算法[J]. 计算机工程与科学,2014,36(8):1423-1429. (HAO Y S,LU J W,LIU G F,et al. Three grid job scheduling methods heuristics for data-intensive and computingintensive jobs[J]. Computer Engineering and Science,2014,36(8):1423-1429.)
[9] KOLICI V,HERRERO A,XHAFA F. On the performance of Oracle Grid Engine queuing system for computing intensive applications[J]. Journal of Information Processing Systems,2014, 10(4):491-502.
[10] NAMIOT D,SNEPS-SNEPPE M. On micro-services architecture[J]. International Journal of Open Information Technologies, 2014,2(9):24-27.
[11] HASSELBRING W,STEINACKER G. Microservice architectures for scalability, agility and reliability in e-commerce[C]//Proceedings of the 2017 IEEE International Conference on Software Architecture Workshops. Piscataway:IEEE, 2017:243-246.
[12] 王杨帅. MES中基于WebClient的声明式服务调用框架[J]. 电子技术与软件工程,2019(4):181.(WANG Y S. Declarative service invocation framework based on WebClient in MES[J]. Electronic Technology and Software Engineering,2019(4):181.)
[13] 江泽源, 刘辉林, 吴刚, 等. 内存数据库的可用性综述[J]. 华东师范大学学报(自然科学版),2014(5):82-88.(JIANG Z Y, LIU H L, WU G, et al. Survey of main-memory database availability[J]. Journal of East China Normal University(Natural Science),2014(5):82-88.)
[14] 沙光华, 陈泳, 张长江. 读写分离技术在运营支撑系统中的应用[J]. 计算机工程与应用,2015,51(12):107-110,175.(SHA G H,CHEN Y,ZHANG C J. Application of read-write splitting technologies in operation support system[J]. Computer Engineering and Applications,2015,51(12):107-110,175.)
[15] 朱涛, 郭进伟, 周欢, 等. 分布式数据库中一致性与可用性的关系[J]. 软件学报,2018,29(1):131-149.(ZHU T,GUO J W, ZHOU H, et al. Consistency and availability in distributed database systems[J]. Journal of Software, 2018, 29(1):131-149.)
[16] 2019年度四川省科学技术进步奖奖励项目[EB/OL].[2020-04-21]. http://video.scview.cn/newssc/keji.pdf?spm=zm5056-001.0.0.1.FDxIri&file=keji.pdf. (2019 Sichuan Science and Technology Progress Award Projects[EB/OL].[2020-04-21]. http://video.scview.cn/newssc/keji.pdf?spm=zm5056-001.0.0.1.FDxIri&file=keji.pdf.)

计算密集型大流量数据的接力计算与动态分流处理

Relay computation and dynamic diversion of computing-intensive large flow data

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 10

编辑推荐

Metrics

[1]	赵徐炎, 崔允贺, 蒋朝惠, 钱清, 申国伟, 郭春, 李显超. CHAIN：基于重合支配的边缘计算节点放置算法[J]. 《计算机应用》唯一官方网站, 2023, 43(9): 2812-2818.
[2]	柏青苏旸. 基于聚类分流算法的分布式蜜罐系统设计[J]. 计算机应用, 2013, 33(04): 1077-1080.
[3]	刘黎志吴云韬. 应用WCF分布式框架实现移动数据同步[J]. 计算机应用, 2011, 31(12): 3281-3284.
[4]	张桂刚. 海量规则网维护及其优化方法[J]. 计算机应用, 2011, 31(03): 670-673.
[5]	冯家耀齐德昱钱正平. 基于数据交换与同步的作业调度方案[J]. 计算机应用, 2009, 29(11): 3165-3170.
[6]	王文琴鞠时光费贤举 . 基于数据复制技术实现移动数据同步[J]. 计算机应用, 2006, 26(7): 1676-1678.
[7]	满红芳. 高速环境下基于数据分流的入侵检测系统设计[J]. 计算机应用, 2005, 25(12): 2734-2735.
[8]	何先波;李志蜀;唐宁九;等. 面向通信领域的主备倒换与数据同步技术[J]. 计算机应用, 2005, 25(10): 2312-2314.
[9]	王达宗，马增良. 冗余SCADA数据同步的设计与构建[J]. 计算机应用, 2005, 25(05): 1225-1226.
[10]	苟艳，陈泳章. 基于PDA与PC数据传输同步协议可扩展的研究[J]. 计算机应用, 2005, 25(01): 186-187.