《计算机应用》唯一官方网站 ›› 2023, Vol. 43 ›› Issue (11): 3517-3526.DOI: 10.11772/j.issn.1001-9081.2022101548

所属专题: 先进计算

• 先进计算 • 上一篇    下一篇

面向国产高性能众核处理器的编程模型

陈虎1,2, 周鹏灵1()   

  1. 1.华南理工大学 软件学院,广州 510006
    2.广东省高性能计算重点实验室,广州 510033
  • 收稿日期:2022-10-14 修回日期:2023-04-22 接受日期:2023-04-24 发布日期:2023-05-24 出版日期:2023-11-10
  • 通讯作者: 周鹏灵
  • 作者简介:陈虎(1974—),男,江苏南京人,副教授,博士,主要研究方向:高性能计算、信息安全
    周鹏灵(1999—),男,湖北鄂州人,硕士研究生,主要研究方向:高性能计算、信息安全。1197615077@qq.com
  • 基金资助:
    国家自然科学基金重点项目(U1836207);广东省高性能计算重点实验室开放课题

Programming model for domestic high-performance many-core processor

Hu CHEN1,2, Pengling ZHOU1()   

  1. 1.School of Software Engineering,South China University of Technology,Guangzhou Guangdong 510006,China
    2.Guangdong Provincial Key Laboratory of High Performance Computing,Guangzhou Guangdong 510033,China
  • Received:2022-10-14 Revised:2023-04-22 Accepted:2023-04-24 Online:2023-05-24 Published:2023-11-10
  • Contact: Pengling ZHOU
  • About author:CHEN Hu, born in 1974, Ph. D., associate professor. His research interests include high-performance computing, information security.
    ZHOU Pengling, born in 1999, M. S. candidate. His research interests include high-performance computing, information security.
  • Supported by:
    Key Project of National Natural Science Foundation of China(U1836207);Open Development Project of Guangdong Provincial Key Laboratory of High Performance Computing

摘要:

在国产高性能众核处理器上编程时,需要直接使用最底层的接口开发软件,这使编程和调试非常困难;并且各自平台的高性能软件编程模型较为基础,计算软件不能通用,造成了重复性开发。针对以上问题,实现了通用编程模型以及所对应的支撑库:一方面基于消息队列机制开发国产高性能众核处理器的线程级并行机制;另一方面基于单指令多数据流(SIMD)编程模型开发从核上的数据级并行性。首先,对国产高性能众核处理器体系结构进行抽象;其次,设计模型的消息队列机制,并为程序员提供一套异构并行编程接口,如系统参数接口、从核线程控制接口、消息队列接口、SIMD抽象接口;最后,在上述基础上形成全新的高性能计算软件开发模型和方法,方便用户开发基于国产高性能众核处理器的并行计算软件。性能传输测试结果表明,在国产众核处理器上,当启动核数较少时,所提模型的传输带宽普遍达到了峰值直接内存访问(DMA)带宽的90%;当启动的核数较多时,消息队列模型的传输带宽普遍达到了峰值DMA带宽的70%。在矩阵乘法实验中,与系统原语传输矩阵并计算的性能相比,所提模型的性能达到前者的90%;在口令猜测系统中,所提模型的代码性能与直接使用最底层的接口开发的代码性能基本持平。所提通用编程模型和支撑框架使高性能计算(HPC)软件开发更简易,并且具有更好的可移植性,可为促进国产自主HPC软件研发提供帮助。

关键词: 国产众核处理器, 单指令多数据流, 并行编程模型, SW26010, 消息队列模型

Abstract:

Programming on domestic high-performance many-core processors has requirement of using the lowest-level interface to develop software, making programming and debugging very difficult. Moreover, the limitations of programming models for high-performance software on these platforms and the absence of common computing software are identified as factors that contribute to repetitive development work. Aiming at the above problems, a generalized programming model and corresponding support library were realized: on the one hand, the thread-level parallelism of domestic high-performance many-core processors based on the message queue mechanism was developed; on the other hand, the data-level parallelism on slave cores based on the Single Instruction Multiple Data (SIMD) programming model was developed. Firstly, the architecture of the domestic high-performance multicore processor was abstracted. Then, a message queue mechanism was designed for the proposed model, along with a set of heterogeneous parallel programming interfaces, including system parameter interface, slave core thread control interface, message queue interface, and SIMD abstraction interface. Finally, a new software development model and methodology for high-performance computing were formed on the basis of the above, which was convenient for users to develop parallel computing software based on domestic high-performance many-core processors. The results of performance transmission test show that the transmission bandwidth of the proposed model on domestic many-core processors generally reaches 90% of the peak DMA (Direct Memory Access) bandwidth when a few multi-cores are turned on; and that the transmission bandwidth of the message queue model generally reaches 70% of the peak DMA bandwidth when a large number of multi-cores are turned on. In matrix multiplication experiments, the performance of the proposed model reaches 90% of the performance of the system’s original primitives for transferring matrices and calculating them; in password guessing system, the performance of the proposed model code is basically the same as that of the code developed by using the lowest-level interface directly. The proposed generalized programming model and support framework make the High Performance Computing (HPC) software development easier and more portable, which can help to promote the development of domestic independent HPC software.

Key words: domestic many-core processor, Single Instruction Multiple Data (SIMD), parallel programming model, SW26010, message queue model

中图分类号: