Abstract:Matrix multiplication plays an important role in scientific computing. Different structural models can improve the performance of parallel matrix multiplication. In the existing MPI+CUDA synchronization model, the host-side need enter the waiting state and cannot continue to work until the device completes the task, which obviously wastes time. Concerning this question, a parallel matrix multiplication based on MPI+CUDA asynchronous model was proposed. This model prevented host-sides entering into the waiting state, and used CUDA-stream technology to solve the problem of data bulk over GPU memory. By analyzing the speedup ratio and efficiency of the asynchronous model, the experimental results show that MPI+CUDA parallel programming obviously promotes parallel efficiency and large-scale matrix multiplication’s speed,which exerts the advantages of the distributional memory between the nodes and the share memory in the node. It is an effective and feasible parallel strategy.
刘青昆 马名威 阎慰椿. 基于MPI+CUDA异步模型的并行矩阵乘法[J]. 计算机应用, 2011, 31(12): 3327-3330.
LIU Qing-kun MA Ming-wei YAN Wei-chun. Parallel matrix multiplication based on MPI+CUDA asynchronous model. Journal of Computer Applications, 2011, 31(12): 3327-3330.