Abstract:A GPU-accelerated preconditoned conjugate gradient method was studied to solve sparse linear equations. And the sparse matrix was stored in the Compressed Sparse Row (CSR) format. The programmes were coded on Compute Unified Device Architecture (CUDA) and tested on the device of nVidia GT430 GPU. According to the features of conjugate gradient method, strategies were investigated to optimize the sparse matrix vector multiplication and the data transfer between CPU and GPU. Compared with the implementation calling cusparseDcsrmv, the self-developed kernel code of sparse matrix vector multiplication can go to a speed-up of 2.1 in the best case. Equipped with this kernel, the preconditioned conjugate gradient code obtains a maximum speed-up of 7.4 against the CPU code, which is a bit advantageous over that using CUBLAS library and CUSPARSE library.
Nvidia. NVIDIA CUDA C programming guide[EB/OL]. [2012-05-15]. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/CUDA_C_Programming_Guide.pdf.
[3]
KRUGER T, WESTERMANN R. Linear algebra operators for GPU implementation of numerical algorithms[J]. ACM Transactions on Graphics, 2003, 22(3):908-916.
[4]
BOLZ J, FARMER I, GRISPUN E, et al. Sparse matrix solvers on the GPU:conjugate gradients and multigrid[J].ACM Transactions on Graphics,2003,22(3):917-924.
[5]
NATHAN B,MICHAEL G. Efficient sparse matrix-vector multiplication on CUDA [R]. Santa Clara, California: NVIDIA, 2008.
[6]
AIL C, AKIRA N, SATOSHI M. Fast conjugate gradients with multiple GPUs[C]// Computational Scinence-ICCS 2009, LNCS 5544. Berlin: Springer,2009:893-903.
[7]
MUTHU M B, RAJESH B. Optimizing sparse matrix-vector multiplication on GPUs[R]. Armonk, NY: IBM,2009.