Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (6): 1577-1582.DOI: 10.11772/j.issn.1001-9081.2018122605

• 2018 National Annual Conference on High Performance Computing (HPC China 2018) • Previous Articles     Next Articles

Research and analysis of supercomputer network boot technology

GONG Daoyong, SONG Changming, LIU Sha, QI Fengbin   

  1. Jiangnan Institute of Computing Technology, Wuxi Jiangsu 214083, China
  • Received:2018-12-12 Revised:2019-02-28 Online:2019-06-17 Published:2019-06-10
  • Supported by:
    This work is partially supported by the National Key Research and Development Program of China (2017YFB0202004).

超级计算机网络引导技术研究与分析

龚道永, 宋长明, 刘沙, 漆锋滨   

  1. 江南计算技术研究所, 江苏 无锡 214083
  • 通讯作者: 龚道永
  • 作者简介:龚道永(1973-),男,安徽颍上人,高级工程师,硕士,主要研究方向:并行操作系统、系统容错、功耗管理;宋长明(1981-),男,黑龙江齐齐哈尔人,助理研究员,硕士,主要研究方向:并行操作系统、系统容错;刘沙(1984-),男,江苏盐城人,工程师,硕士,主要研究方向:并行操作系统、人工智能;漆锋滨(1966-),男,江西南昌人,研究员,博士,CCF会士,主要研究方向:计算机体系结构、编译器。
  • 基金资助:
    国家重点研发计划项目(2017YFB0202004)。

Abstract: Since the network booting time overhead is high in supercomputer system, the idea that the network boot distribution algorithm is one of the main factors affecting the network boot performance and the main direction of optimizing network boot performance was proposed. Firstly, the main factors affecting large-scale network boot performance were analyzed. Secondly, combined with a typical supercomputer system, the network boot data flow topologies of Supernode Cyclic Distribution Algorithm (SCDA) and Board Cyclic Distribution Algorithm (BCDA) were analyzed. Finally, the pressure of above two algorithms on each network path branch and the available network performance were quantitatively analyzed. It can be seen that the bandwidth performance of BCDA is 1-20 times of that of SCDA. Theoretical analysis and model deduction show that the finer-grained mapping algorithm between compute nodes and boot servers can make as many boot servers as possible be used while boot some resources, reducing the premature competition for partial network resources and improving network boot performance.

Key words: supercomputer, board, supernode, network boot overhead, full path minimum bandwidth, network boot distribution algorithm

摘要: 针对超级计算机系统中网络引导时间开销大的问题,提出网络引导分布算法是影响网络引导性能的主要因素之一,是优化网络引导性能的主要方向的观点。首先,分析了影响大规模网络引导性能的主要因素;其次,结合一种典型超级计算机系统,分析了超节点循环分布算法(SCDA)和插件循环分布算法(BCDA)的网络引导数据流拓扑结构;最后,量化分析了这两种算法对各个网络路径段的压力和可获得的网络性能,发现BCDA性能是SCDA性能的1~20倍。通过理论分析和模型推导发现,在计算节点和引导服务器之间使用更细粒度的映射算法可以在引导部分资源时使用尽量多的引导服务器,减少对局部网络资源的过早竞争,提升网络引导性能。

关键词: 超级计算机, 插件板, 超节点, 网络引导开销, 全路径最小带宽, 网络引导分布算法

CLC Number: