Journal of Computer Applications ›› 2013, Vol. 33 ›› Issue (03): 825-829.DOI: 10.3724/SP.J.1087.2013.00825

• Network and distributed techno • Previous Articles     Next Articles

GPU-based preconditioned conjugate gradient method for solving sparse linear systems

ZHANG Jianfei, SHEN Defei*   

  1. College of Mechanics and Materials, Hohai University, Nanjing Jiangsu 210098, China
  • Received:2012-09-03 Revised:2012-10-29 Online:2013-03-01 Published:2013-03-01
  • Contact: De-Fei SHEN

基于GPU的稀疏线性系统的预条件共轭梯度法

张健飞,沈德飞*   

  1. 河海大学 力学与材料学院,南京 210098
  • 通讯作者: 沈德飞
  • 作者简介:张健飞(1977-),男,江苏海门人,讲师,博士,主要研究方向:高性能计算、应用数值分析、计算力学、工程仿真; 沈德飞(1988-),女,江苏建湖人,硕士研究生,主要研究方向:计算力学、高性能计算。
  • 基金资助:

    国家自然科学基金资助项目(51109072)。

Abstract: A GPU-accelerated preconditoned conjugate gradient method was studied to solve sparse linear equations. And the sparse matrix was stored in the Compressed Sparse Row (CSR) format. The programmes were coded on Compute Unified Device Architecture (CUDA) and tested on the device of nVidia GT430 GPU. According to the features of conjugate gradient method, strategies were investigated to optimize the sparse matrix vector multiplication and the data transfer between CPU and GPU. Compared with the implementation calling cusparseDcsrmv, the self-developed kernel code of sparse matrix vector multiplication can go to a speed-up of 2.1 in the best case. Equipped with this kernel, the preconditioned conjugate gradient code obtains a maximum speed-up of 7.4 against the CPU code, which is a bit advantageous over that using CUBLAS library and CUSPARSE library.

Key words: Graphic Processing Unit (GPU), sparse linear equations, preconditioned conjugate gradient method, Compressed Sparse Row (CSR), Compute Unified Device Architecture (CUDA)

摘要: 研究了基于GPU的稀疏线性方程组的预条件共轭梯度法加速求解问题,并基于统一计算设备架构(CUDA)平台编制了程序,在NVIDIAGT430 GPU平台上进行了程序性能测试和分析。稀疏矩阵采用压缩稀疏行(CSR)格式压缩存储,针对预条件共轭梯度法的算法特性,研究了基于GPU的稀疏矩阵与向量相乘的性能优化、数据从CPU端传到GPU端的加速传输措施。将编制的稀疏矩阵与向量相乘的kernel函数和CUSPARSE函数库中的cusparseDcsrmv函数性能进行了对比,最优得到了2.1倍的加速效果。对于整个预条件共轭梯度法,通过自编kernel函数来实现的算法较之采用CUBLAS库和CUSPARSE库实现的算法稍具优势,与CPU端的预条件共轭梯度法相比,最优可以得到7.4倍的加速效果。

关键词: 图形处理器, 稀疏线性方程组, 预条件共轭梯度法, 压缩稀疏行, 统一计算设备架构

CLC Number: