GPU-based preconditioned conjugate gradient method for solving sparse linear systems

doi:10.3724/SP.J.1087.2013.00825

Journal of Computer Applications ›› 2013, Vol. 33 ›› Issue (03): 825-829.DOI: 10.3724/SP.J.1087.2013.00825

• Network and distributed techno • Previous Articles Next Articles

GPU-based preconditioned conjugate gradient method for solving sparse linear systems

ZHANG Jianfei, SHEN Defei^*

College of Mechanics and Materials, Hohai University, Nanjing Jiangsu 210098, China

Received:2012-09-03 Revised:2012-10-29 Online:2013-03-01 Published:2013-03-01
Contact: De-Fei SHEN

基于GPU的稀疏线性系统的预条件共轭梯度法

张健飞,沈德飞^*

河海大学力学与材料学院,南京 210098

通讯作者: 沈德飞
作者简介:张健飞(1977-),男,江苏海门人,讲师,博士,主要研究方向:高性能计算、应用数值分析、计算力学、工程仿真; 沈德飞(1988-),女,江苏建湖人,硕士研究生,主要研究方向:计算力学、高性能计算。
基金资助:
国家自然科学基金资助项目(51109072)。

Abstract

Abstract: A GPU-accelerated preconditoned conjugate gradient method was studied to solve sparse linear equations. And the sparse matrix was stored in the Compressed Sparse Row (CSR) format. The programmes were coded on Compute Unified Device Architecture (CUDA) and tested on the device of nVidia GT430 GPU. According to the features of conjugate gradient method, strategies were investigated to optimize the sparse matrix vector multiplication and the data transfer between CPU and GPU. Compared with the implementation calling cusparseDcsrmv, the self-developed kernel code of sparse matrix vector multiplication can go to a speed-up of 2.1 in the best case. Equipped with this kernel, the preconditioned conjugate gradient code obtains a maximum speed-up of 7.4 against the CPU code, which is a bit advantageous over that using CUBLAS library and CUSPARSE library.

Key words: Graphic Processing Unit (GPU), sparse linear equations, preconditioned conjugate gradient method, Compressed Sparse Row (CSR), Compute Unified Device Architecture (CUDA)

摘要： 研究了基于GPU的稀疏线性方程组的预条件共轭梯度法加速求解问题,并基于统一计算设备架构(CUDA)平台编制了程序,在NVIDIAGT430 GPU平台上进行了程序性能测试和分析。稀疏矩阵采用压缩稀疏行(CSR)格式压缩存储,针对预条件共轭梯度法的算法特性,研究了基于GPU的稀疏矩阵与向量相乘的性能优化、数据从CPU端传到GPU端的加速传输措施。将编制的稀疏矩阵与向量相乘的kernel函数和CUSPARSE函数库中的cusparseDcsrmv函数性能进行了对比,最优得到了2.1倍的加速效果。对于整个预条件共轭梯度法,通过自编kernel函数来实现的算法较之采用CUBLAS库和CUSPARSE库实现的算法稍具优势,与CPU端的预条件共轭梯度法相比,最优可以得到7.4倍的加速效果。

关键词: 图形处理器, 稀疏线性方程组, 预条件共轭梯度法, 压缩稀疏行, 统一计算设备架构

CLC Number:

TP312

ZHANG Jianfei SHEN Defei. GPU-based preconditioned conjugate gradient method for solving sparse linear systems[J]. Journal of Computer Applications, 2013, 33(03): 825-829.

张健飞沈德飞. 基于GPU的稀疏线性系统的预条件共轭梯度法[J]. 计算机应用, 2013, 33(03): 825-829.

References

[1]曾攀.工程中的有限元方法[M].3版.北京:清华大学出版社,2006.
[2]Nvidia. NVIDIA CUDA C programming guide[EB/OL]. [2012-05-15]. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf.
[3]KRUGER T, WESTERMANN R. Linear algebra operators for GPU implementation of numerical algorithms[J]. ACM Transactions on Graphics, 2003, 22(3):908-916.
[4]BOLZ J, FARMER I, GRISPUN E, et al. Sparse matrix solvers on the GPU:conjugate gradients and multigrid[J].ACM Transactions on Graphics,2003,22(3):917-924.
[5]NATHAN B,MICHAEL G. Efficient sparse matrix-vector multiplication on CUDA [R]. Santa Clara, California: NVIDIA, 2008.
[6]AIL C, AKIRA N, SATOSHI M. Fast conjugate gradients with multiple GPUs[C]// Computational Scinence-ICCS 2009, LNCS 5544. Berlin: Springer,2009:893-903.
[7]MUTHU M B, RAJESH B. Optimizing sparse matrix-vector multiplication on GPUs[R]. Armonk, NY: IBM,2009.
[8]李熙铭. 基于GPU的高性能有限元方法研究[D].长春:吉林大学,2011.
[9]胡耀国.基于GPU的有限元方法研究[D].武汉:华中科技大学,2011.
[10]李晓梅,吴建平. Krylov子空间方法及其并行计算[J].计算机科学,2005, 32(1): 19-20.
[11]李爱芹. 线性方程组的迭代解法[J]. 科学技术与工程,2007, 7(14): 3357-3364.
[12]YOUSEF S. Iterative methods for sparse linear systems[M]. 2rd ed. Philadelphia: Society for Industrial and Applied Mathematics,2003.
[13]张兰.稀疏矩阵方程组预处理迭代技术研究[D].广州:华南理工大学,2010.
[14]Nvidia. CUDA CUSPARSE Library[EB/OL]. [2012-07-01]. http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CUSPARSE_Library.pdf.
[15]Nvidia. CUDA CUBLAS Library[EB/OL]. [2012-07-01].http://developer.download.nvidia.com/compute/DevZone/docs/html/CUDALibraries/doc/CUBLAS_Library.pdf.
[16]刘小虎,胡耀国,符伟.大规模有限元系统的GPU加速计算研究[J].计算力学学报, 2012, 29(1):146-152.
[17]白洪涛.基于GPU的高性能并行算法研究[D].长春:吉林大学,2010.
[18]University of Florida. The University of Florida Sparse Matrix Collection [DB/OL]. [2012-08-06].http://www.cise.ufl.edu/research/sparse/matrices.

GPU-based preconditioned conjugate gradient method for solving sparse linear systems

基于GPU的稀疏线性系统的预条件共轭梯度法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	Qidi XU, Zhenghong LIU, Lin ZHENG. Low density parity check code decoding acceleration technology based on GPU [J]. Journal of Computer Applications, 2022, 42(12): 3841-3846.
[2]	XIE Wenbo, WEI Yongzhuang, LIU Zhenghong. Parallel implementation and analysis of SKINNY encryption algorithm using CUDA [J]. Journal of Computer Applications, 2021, 41(4): 1136-1141.
[3]	YANG Xianfeng, GUI Hongjun, FU Chunchang. F-X domain predictive filtering parallel algorithm based on compute unified device architecture [J]. Journal of Computer Applications, 2021, 41(2): 486-491.
[4]	JI Lina, CHEN Qingkui, CHEN Yuanjing, ZHAO Deyu, FANG Yuling, ZHAO Yongtao. Real-time crowd counting method from video stream based on GPU [J]. Journal of Computer Applications, 2017, 37(1): 145-152.
[5]	ZHAO Yongtao, CHEN Qingkui, FANG Yuling, ZHAO Deyu, JI Lina. Early warning method for driving safety based on CUDA [J]. Journal of Computer Applications, 2017, 37(1): 134-137.
[6]	GUAN Yaqin, ZHAO Xuesheng, WANG Pengfei, LI Dapeng. Parallel algorithm for massive point cloud simplification based on slicing principle [J]. Journal of Computer Applications, 2016, 36(7): 1793-1796.
[7]	LONG Chao, HAN Bo, ZHANG Yu. Three-dimensional SLAM using Kinect and visual dictionary [J]. Journal of Computer Applications, 2016, 36(3): 774-778.
[8]	ZHANG Shuo, HE Fazhi, ZHOU Yi, YAN Xiaohu. GPU parallel particle swarm optimization algorithm based on adaptive warp [J]. Journal of Computer Applications, 2016, 36(12): 3274-3279.
[9]	ZHA Shanshan, WANG Yuanjun, NIE Shengdong. Development of medical image registration technology using GPU [J]. Journal of Computer Applications, 2015, 35(9): 2486-2491.
[10]	WANG Lei, WANG Pengfei, ZHAO Xuesheng, LU Lituo. Optimization of spherical Voronoi diagram generating algorithm based on graphic processing unit [J]. Journal of Computer Applications, 2015, 35(6): 1564-1566.
[11]	WANG Wenbo, YIN Hong, XIE Wenbin, WANG Jiateng. Terrain rendering for level of detail based on hardware tessellation [J]. Journal of Computer Applications, 2015, 35(6): 1716-1719.
[12]	WANG Yuzhuo, LIU Xiuguo, ZHANG Wei. Parallel algorithm of raster river network extraction based on CUDA [J]. Journal of Computer Applications, 2015, 35(4): 960-963.
[13]	LI Jinjing, CHEN Qingkui, LIU Baoping, LIU Bocheng. Binary probability segmentation of video based on graphics processing unit [J]. Journal of Computer Applications, 2015, 35(11): 3187-3193.
[14]	CHEN Zhi, LI Tianrui, LI Ming, YANG Yan. Fault diagnosis method of high-speed rail based on compute unified device architecture [J]. Journal of Computer Applications, 2015, 35(10): 2819-2823.
[15]	WEN La RUI Jianwu HE Tingting GUO Liang. Accelerating hierarchical distributed latent Dirichlet allocation algorithm by parallel GPU [J]. Journal of Computer Applications, 2013, 33(12): 3313-3316.