计算机应用 ›› 2011, Vol. 31 ›› Issue (06): 1453-1457.DOI: 10.3724/SP.J.1087.2011.01453

• 网络与通信 •    下一篇

KD60集群消息传递接口群集通信算法优化

郑启龙1,2,汪睿1,2,周寰1,2   

  1. 1. 安徽省高性能计算重点实验室,合肥 230026
    2. 中国科学技术大学 计算机科学与技术学院,合肥 230027
  • 收稿日期:2010-12-09 修回日期:2011-01-12 发布日期:2011-06-20 出版日期:2011-06-01
  • 通讯作者: 汪睿
  • 作者简介:郑启龙(1969-),男,四川成都人,副教授,主要研究方向:并行计算、高效能软件工具与环境;〓
    汪睿(1987-),男,安徽巢湖人,硕士研究生,主要研究方向:并行与分布式计算、高性能集群通信;〓
    周寰(1989-),女,安徽巢湖人,硕士研究生,主要研究方向:高性能集群通信。
  • 基金资助:
    国家“核高基”重大专项;安徽省自然科学基金资助项目

Algorithm optimization of MPI collective communications in KD60

ZHENG Qi-long1,2,WANG Rui1,2,ZHOU Huan1,2   

  1. 1. Anhui Provincial Key Laboratory of High Performance Computing, Hefei Anhui 230026, China
    2. School of Computer Science and Technology, University of Science and Technology of China, Hefei Anhui 230027, China
  • Received:2010-12-09 Revised:2011-01-12 Online:2011-06-20 Published:2011-06-01
  • Contact: WANG Rui

摘要: 大规模集群已经发展到多核的时代,多核架构对并行计算提出了新的要求。消息传递接口(MPI)是最常用的并行编程模型,而群集通信又是MPI中的重要组成部分。研究高效的群集通信算法对并行计算效率的提升有着重要的作用。KD60平台是采用首款国产多核芯片——龙芯3号搭建的国产万亿次多核集群。首先分析了KD60平台多核集群的体系特征以及多核架构下通信具有的层次性特征;然后分析原有群集通信算法实现原理及其不足;最后以广播为例,在原有算法基础上,采用一种基于片上多核(CMP)架构改进算法,改变原有算法通信模式,同时结合实验平台KD60体系特征,对算法做了体系相关优化。实验结果表明,改进算法能够很好地利用多核结构的特点,提高了群集通信广播算法的性能。

关键词: 消息传递接口, 多核集群, 群集通信优化, KD60

Abstract: Large clusters have been developed to multicore era, and multicore architecture makes new demands on parallel computation. Message Passing Interface (MPI) is the most commonly used parallel programming model, and collective communications is an important part of the MPI standard. Efficient collective communications algorithm plays a vital role in improving the performance of parallel computation. This paper first analyzed the architecture features of KD60 and communication hierarchy characteristics under multicore architecture, and then introduced the implementation of collective communications algorithm in MPICH2 and pointed out its deficiencies. At last, this article took broadcasting as an example, using an improved algorithm based on CMP architecture,which changes the communication mode of the original algorithm. At the same time, this paper optimized the algorithm according to the architecture characteristics of KD60. The experimental results show that the improved algorithm improves the performance of broadcast in MPI.

Key words: Message Passing Interface (MPI), multicore cluster, collective communications optimization, KD60