计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2646-2651.DOI: 10.11772/j.issn.1001-9081.2020111725

所属专题: 先进计算

• 先进计算 • 上一篇    下一篇

计算密集型大流量数据的接力计算与动态分流处理

廖佳1, 陈扬1, 包秋兰1, 廖雪花2, 朱洲森1   

  1. 1. 四川师范大学 物理与电子工程学院, 成都 610101;
    2. 四川师范大学 计算机科学学院, 成都 610101
  • 收稿日期:2020-11-05 修回日期:2021-02-08 出版日期:2021-09-10 发布日期:2021-05-08
  • 通讯作者: 朱洲森
  • 作者简介:廖佳(1996-),女,四川雅安人,硕士研究生,主要研究方向:数据计算与分析;陈扬(1997-),女,四川达州人,硕士研究生,主要研究方向:大数据;包秋兰(1996-),女,四川遂宁人,硕士研究生,主要研究方向:数据计算与分析;廖雪花(1976-),女,四川德阳人,副教授,硕士,CCF会员,主要研究方向:大数据存储、内存技术、数据计算与分析;朱洲森(1966-),男,陕西西安人,教授,硕士,主要研究方向:数据计算与分析、大数据、系统框架。
  • 基金资助:
    国家社会科学基金资助项目(20BMZ092);教育部产学合作协同育人项目(201802002036, 201901075008)。

Relay computation and dynamic diversion of computing-intensive large flow data

LIAO Jia1, CHEN Yang1, BAO Qiulan1, LIAO Xuehua2, ZHU Zhousen1   

  1. 1. College of Physics and Electronic Engineering, Sichuan Normal University, Chengdu Sichuan 610101, China;
    2. College of Computer Science, Sichuan Normal University, Chengdu Sichuan 610101, China
  • Received:2020-11-05 Revised:2021-02-08 Online:2021-09-10 Published:2021-05-08
  • Supported by:
    This work is partially supported by the National Social Science Foundation of China (20BMZ092), the Industry and Study Cooperative Education Project of the Ministry of Education (201802002036, 201901075008).

摘要: 针对当前大流量数据计算速度慢、服务器端计算压力大等问题,提出一套计算密集型大流量数据的接力计算与动态分流处理模型。首先,在分布式环境下,使用内存型数据存储技术确定计算任务的运算量与复杂等级,同时利用节点资源能力对节点进行排序;然后,动态分配任务到不同节点进行并行计算,并采用一种接力处理模式完成计算任务的分解,以有效保证高流量复杂运算任务的性能和精度要求。通过分析对比,可知在万级以上数据量的情况下,多个节点比单个节点的运行时间更短、计算速度更快;而且,将该模型应用于实际时,发现它不仅能在高并发场景下减少运行时间,而且也能节省更多计算资源。

关键词: 数据分流, 接力计算, 计算节点, 数据同步, 内存型数据存储

Abstract: In view of the problems such as the slow computation of large flow data, the high computation pressure on the server, a set of relay computation and dynamic diversion model of computing-intensive large flow data was proposed. Firstly, in the distributed environment, the in-memory data storage technology was used to determine the computation amounts and complexity levels of the computation tasks. At the same time, the nodes were sorted by the node resource capacity, and the tasks were dynamically allocated to different nodes for parallel computing. Meanwhile, the computation tasks were decomposed by a relay processing mode, so as to guarantee the performance and accuracy requirements of high flow complex computing tasks. Through analysis and comparison, it can be seen that the running time of multiple nodes is shorter than that of the single node, and the computation speed of multiple nodes is faster than that of the single node when dealing with data volume of more than 10 000 levels. At the same time, when the model is applied in practice, it can be seen that the model can not only reduce the running time in high concurrency scenarios but also save more computing resources.

Key words: data diversion, relay computation, computation node, data synchronization, in-memory data storage

中图分类号: