Fast operation of large-scale high-precision matrix based on GPU

• Typical applications • Previous Articles Next Articles

Fast operation of large-scale high-precision matrix based on GPU

Received:2008-10-20 Revised:2008-11-27 Online:2009-04-01 Published:2009-04-01
Contact: Chang SU

一种在GPU上高精度大型矩阵快速运算的实现

苏畅付忠良谭雨辰

中国科学院成都计算机应用研究所中科院成都计算机应用研究所中科院成都计算机应用研究所

通讯作者: 苏畅

Abstract

Abstract: A fast calculation approach for large-scale matrix operation, which can be accomplished by Graphic Processing Unit (GPU), was designed. For taking full advantage of the parallel architecture of GPU to enhance the calculation speed, special matrix partitioning and memory allocation mechanism according to the features of GPU were designed to decrease the frequency of data access. Meanwhile Kahan's summation formula was introduced to ensure the precision of the calculation. The result shows that the approach can achieve better effect and greatly enhance the speed and the precision of the large matrix multiplication.

Key words: Graphic Processing Unit (GPU), matrix operation, high-precision, parallel architecture

摘要： 设计了一种在图形处理器(GPU)上完成大型矩阵快速运算的方法，主要通过使用Kahan求和公式来确保计算精度，根据GPU特点设计矩阵分块方式和内存分配机制来减少对数据访问频次,以发挥GPU的并行体系结构特性来提高计算速度。实验结果表明此方法能够取得较好的效果，可大大提升大型矩阵乘法的运算速度和精度。

关键词: 图形处理器, 矩阵运算, 高精度, 并行体系结构

CLC Number:

TP301

Chang SU Zhong-liang FU Yu-chen TAN. Fast operation of large-scale high-precision matrix based on GPU[J]. Journal of Computer Applications.

苏畅付忠良谭雨辰. 一种在GPU上高精度大型矩阵快速运算的实现[J]. 计算机应用.

[1]	Qidi XU, Zhenghong LIU, Lin ZHENG. Low density parity check code decoding acceleration technology based on GPU [J]. Journal of Computer Applications, 2022, 42(12): 3841-3846.
[2]	XIE Wenbo, WEI Yongzhuang, LIU Zhenghong. Parallel implementation and analysis of SKINNY encryption algorithm using CUDA [J]. Journal of Computer Applications, 2021, 41(4): 1136-1141.
[3]	YANG Xianfeng, GUI Hongjun, FU Chunchang. F-X domain predictive filtering parallel algorithm based on compute unified device architecture [J]. Journal of Computer Applications, 2021, 41(2): 486-491.
[4]	DENG Tianmin, FANG Fang, YUE Yunxia, YANG Qizhi. GNSS/INS global high-precision positioning method based on Elman neural network [J]. Journal of Computer Applications, 2019, 39(4): 994-1000.
[5]	XIE Lixia, WANG Zhihua. Network security situation assessment method based on cuckoo search optimized back propagation neural network [J]. Journal of Computer Applications, 2017, 37(7): 1926-1930.
[6]	HU Yuanyuan, XIE Jiang, ZHANG Wu. Solution of two dimensional incompressible Navier-Stokes equation by parallel spectral finite element method [J]. Journal of Computer Applications, 2017, 37(1): 42-47.
[7]	LONG Chao, HAN Bo, ZHANG Yu. Three-dimensional SLAM using Kinect and visual dictionary [J]. Journal of Computer Applications, 2016, 36(3): 774-778.
[8]	DENG Zhongliang, ZHANG Senjie, JIAO Jichao, XU Lianming. Research and application of high-precision indoor location-aware big data [J]. Journal of Computer Applications, 2016, 36(2): 295-300.
[9]	ZHANG Shuo, HE Fazhi, ZHOU Yi, YAN Xiaohu. GPU parallel particle swarm optimization algorithm based on adaptive warp [J]. Journal of Computer Applications, 2016, 36(12): 3274-3279.
[10]	ZHA Shanshan, WANG Yuanjun, NIE Shengdong. Development of medical image registration technology using GPU [J]. Journal of Computer Applications, 2015, 35(9): 2486-2491.
[11]	WANG Wenbo, YIN Hong, XIE Wenbin, WANG Jiateng. Terrain rendering for level of detail based on hardware tessellation [J]. Journal of Computer Applications, 2015, 35(6): 1716-1719.
[12]	WEN La RUI Jianwu HE Tingting GUO Liang. Accelerating hierarchical distributed latent Dirichlet allocation algorithm by parallel GPU [J]. Journal of Computer Applications, 2013, 33(12): 3313-3316.
[13]	DANG Xiangying BAO Rong JIANG Daihong. GPU parallel implementation of edge-detection algorithm based on multidirectional linear gradient adjusted predictor [J]. Journal of Computer Applications, 2013, 33(07): 2002-2004.
[14]	ZHANG Jianfei SHEN Defei. GPU-based preconditioned conjugate gradient method for solving sparse linear systems [J]. Journal of Computer Applications, 2013, 33(03): 825-829.
[15]	WANG Hai-feng. Parallel algorithms for complex network clustering with GPUs [J]. Journal of Computer Applications, 2012, 32(09): 2458-2462.