Journal of Computer Applications ›› 2018, Vol. 38 ›› Issue (12): 3496-3499.DOI: 10.11772/j.issn.1001-9081.2018040771

Previous Articles     Next Articles

MPI big data processing for high performance applications

WANG Peng, ZHOU Yan   

  1. School of Computer Science and Technology, Southwest Minzu University, Chengdu Sichuan 610225, China
  • Received:2018-04-16 Revised:2018-06-11 Online:2018-12-10 Published:2018-12-15
  • Contact: 周岩
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (60702075), the Fundamental Research Funds for the Central Universities of Southwest University for Nationalities (2017NZYQN27), the Guangdong Province Science and Technology Agency 2016 Provincial Special Fund for Science and Technology Development Project (2016B090918062), the 2016 Major Project of Collaborative Innovation in Production, Teaching and Research of Guangzhou (201604010115).

面向高性能应用的MPI大数据处理

王鹏, 周岩   

  1. 西南民族大学 计算机科学与技术学院, 成都 610225
  • 通讯作者: 周岩
  • 作者简介:王鹏(1975-),男,四川乐山人,教授,博士,CCF会员,主要研究方向:云计算、并行计算、量子计算;周岩(1976-),男,陕西西安人,硕士研究生,主要研究方向:云计算、并行计算、量子计算。
  • 基金资助:
    国家自然科学基金资助项目(60702075);西南民族大学中央高校基本科研业务费专项(2017NZYQN27);广东省科学技术厅2016年省科技发展专项资金项目(2016B090918062);广州市2016年产学研协同创新重大专项(201604010115)。

Abstract: In view of the application scenario of Message Passing Interface (MPI) in the field of high performance computing, in order to optimize the existing data centralized management model of MPI and enhance its processing capability for big data, a set of MPI Data Storage Plug-in (MPI-DSP) for large data processing was developed and designed by using the idea of parallel and distributed systems. Firstly, the interface function was created to achieve the design goal of "calculation to storage migration" in a way of minimizing the impact on MPI system. The file allocation and calculation were separated to make the MPI break through the bottleneck of network transmission when reading large data files. Then, the design goal, operation mechanism and implementation strategy were analyzed and elaborated. The design concept was verified by describing the application of interface function MPI_Open in MPI environment. By comparing the time performance of using MPI-DSP component with that of original MPI in data file processing through Wordcount experiment, the feasibility of MPI "computation to storage migration" mode was preliminarily validated, which enables that it has the large data processing capability in high performance application scenarios. At the same time, the applicable environment and limitations of MPI-DSP were analyzed, and its application scope was defined.

Key words: Message Passing Interface (MPI), parallel computing, big data, High Performance Computing (HPC), Data Storage Plugin (DSP)

摘要: 针对消息传递接口(MPI)在高性能计算领域的应用场景,为了优化MPI现有数据集中管理模式,增强其对大数据的处理能力,借鉴并行与分布式系统思想,开发设计一套适用于大数据处理的基于MPI的数据存储组件(MPI-DSP)。首先,创建接口函数,以对MPI系统影响最小的方式实现"计算向存储迁移"的设计目标,将文件分配与计算进行分离,使MPI突破大数据文件读取时的网络传输瓶颈。然后,分析阐述设计目标、运行机制、实现策略,通过描述接口函数MPI_Open在MPI环境下的应用,验证设计理念。通过Wordcount实验对比使用MPI-DSP组件与原MPI在数据文件处理方面的时间性能,初步验证了MPI"计算向存储迁移"模式的可行性,使其具备在高性能应用场景下的大数据处理能力。同时分析了MPI-DSP的适用环境和局限性,界定了其应用范围。

关键词: 消息传递接口, 并行计算, 大数据, 高性能计算, 数据存储插件

CLC Number: