Implementation of LU decomposition and Laplace algorithms on GPU
CHEN Ying1,LIN Jin-xian2,LV Tun3
1. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350108, China 2. College of Mathematics and Computer Science, Fuzhou University, Fuzhou Fujian 350108, China; Fujian Supercomputing Center, Fuzhou University, Fuzhou Fujian 350108, China 3. Fujian Supercomputing Center, Fuzhou University, Fuzhou Fujian 350108, China; College of Biological Science and Technology, Fuzhou University, Fuzhou Fujian 350108, China
Abstract:With the advancement of Graphics Processing Unit (GPU) and the creation of its new feature of programmability, many algorithms have been successfully transferred to GPU. LU decomposition and Laplace algorithms are the core in scientific computation, but computation is usually too large; therefore, a speedup method was proposed. The implementation was based on Nvidia's GPU which supported Compute Unified Device Architecture (CUDA). Dividing tasks on CPU and GPU, using shared memory on GPU to increase the speed of data access, eliminating the branch in GPU program and stripping the matrix were used to speed up the algorithms. The experimental results show that with the size of matrix increasing, the algorithm based on GPU has a good speedup compared with the algorithm based on CPU.
陈颖 林锦贤 吕暾. LU分解和Laplace算法在GPU上的实现[J]. 计算机应用, 2011, 31(03): 851-855.
CHEN Ying LIN Jin-xian LV Tun. Implementation of LU decomposition and Laplace algorithms on GPU. Journal of Computer Applications, 2011, 31(03): 851-855.