Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (12): 3274-3279.DOI: 10.11772/j.issn.1001-9081.2016.12.3274

Previous Articles     Next Articles

GPU parallel particle swarm optimization algorithm based on adaptive warp

ZHANG Shuo, HE Fazhi, ZHOU Yi, YAN Xiaohu   

  1. School of Computer, Wuhan University, Wuhan Hubei 430072, China
  • Received:2016-06-03 Revised:2016-07-06 Online:2016-12-10 Published:2016-12-08
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61472289), the Natural Science Foundation of Hubei Province (2015CFB254).


张硕, 何发智, 周毅, 鄢小虎   

  1. 武汉大学 计算机学院, 武汉 430072
  • 通讯作者: 何发智
  • 作者简介:张硕(1992-),男,湖北仙桃人,硕士研究生,主要研究方向:计算机图形学、GPU并行计算;何发智(1968-),男,湖北武汉人,教授,博士,CCF会员,主要研究方向:计算机支持的协同工作、计算机图形学、图像处理、并行计算;周毅(1983-),男,湖北汉川人,高级工程师,博士研究生,主要研究方向:GPU通用计算、智能优化算法;鄢小虎(1986-),男,湖北武汉人,高级工程师,博士研究生,CCF会员,主要研究方向为:软硬件协同设计、智能优化算法。
  • 基金资助:

Abstract: The parallel Particle Swarm Optimization (PSO) algorithm was improved through Graphics Processor Unit (GPU) based on Compute Unified Device Architecture (CUDA). According to the structural characteristics of the CUDA hardware system, it can be concluded that block is executed serially and the basic scheduled and executive unit of Streaming Multiprocessor (SM) is warp. GPU parallel PSO algorithm based on adaptive warp was carried out in order to make full use of thread parallelism in the block. The dimensions of particles were corresponded to the threads of particles. Each particle was corresponded to one or more warps in accordance with its self-dimension adaptively by using the warp level parallelism of GPU. One or more particles were corresponded to each block. Comparison with the existing coarse-grained parallel approach (corresponding each particle to the thread) and fine-grained parallel approach (corresponding each particle to the block) was made, and the experimental results show that the proposed parallel approach achieves CPU speed-up ratio of 40 more than two kinds of approaches mentioned above.

Key words: Particle Swarm Optimization (PSO) algorithm, parallel computing, Graphic Processing Unit (GPU), Compute Unified Device Architecture (CUDA), adaptive warp

摘要: 基于统一计算设备架构(CUDA)对图形处理器(GPU)下的并行粒子群优化(PSO)算法作改进研究。根据CUDA的硬件体系结构特点,可知Block是串行执行的,线程束(Warp)才是流多处理器(SM)调度和执行的基本单位。为了充分利用Block中线程的并行性,提出基于自适应线程束的GPU并行PSO算法:将粒子的维度和线程相对应;利用GPU的Warp级并行,根据维度的不同自适应地将每个粒子与一个或多个Warp相对应;自适应地将一个或多个粒子与每个Block相对应。与已有的粗粒度并行方法(将每个粒子和线程相对应)以及细粒度并行方法(将每个粒子和Block相对应)进行了对比分析,实验结果表明,所提出的并行方法相对前两种并行方法,CPU加速比最多提高了40。

关键词: 粒子群优化算法, 并行计算, 图形处理器, 统一计算设备架构, 自适应线程束

CLC Number: