计算机应用 ›› 2016, Vol. 36 ›› Issue (7): 1801-1806.DOI: 10.11772/j.issn.1001-9081.2016.07.1801

• 先进计算 • 上一篇    下一篇

基于OpenCL的尺度不变特征变换算法的并行设计与实现

许川佩1,2, 王光1,2   

  1. 1. 桂林电子科技大学 电子工程与自动化学院, 广西 桂林 541004;
    2. 广西自动检测技术与仪器重点实验室(桂林电子科技大学), 广西 桂林 541004
  • 收稿日期:2015-12-10 修回日期:2016-02-22 出版日期:2016-07-10 发布日期:2016-07-14
  • 通讯作者: 王光
  • 作者简介:许川佩(1968-),女,广西桂林人,教授,博士,主要研究方向:自动测试总线与系统、计算机辅助设计、测试技术;王光(1989-),男,河南商丘人,硕士研究生,主要研究方向:图像处理。

Parallel design and implementation of scale invariant feature transform algorithm based on OpenCL

XU Chuanpei1,2, WANG Guang1,2   

  1. 1. School of Electrical Engineering and Automation, Guilin University of Electronic Technology, Guilin Guangxi 541004, China;
    2. Guangxi Key Laboratory of Automatic Detecting Technology and Instruments, Guilin University of Electronic Technology, Guilin Guangxi 541004, China
  • Received:2015-12-10 Revised:2016-02-22 Online:2016-07-10 Published:2016-07-14

摘要: 针对尺度不变特征变换(SIFT)算法实时性差的问题,提出了利用开放式计算语言(OpenCL)并行优化的SIFT算法。首先,通过对原算法各步骤进行组合拆分、重构特征点在内存中的数据索引等方式对原算法进行并行化重构,使得算法的中间计算结果能够完全在显存中完成交互;然后,采用复用全局内存对象、共享局部内存、优化内存读取等策略对原算法各步骤进行并行设计,提高数据读取效率,降低传输延时;最后,利用OpenCL语言在图形处理单元(GPU)上实现了SIFT算法的细粒度并行加速,并在中央处理器(CPU)上完成了移植。与原SIFT算法配准效果相近时,并行化的算法在GPU和CPU平台上特征提取速度分别提升了10.51~19.33和2.34~4.74倍。实验结果表明,利用OpenCL并行加速的SIFT算法能够有效提高图像配准的实时性,并能克服统一计算设备架构(CUDA)因移植困难而不能充分利用异构系统中多种计算核心的缺点。

关键词: 尺度不变特征变换算法, 开放式计算语言, 复用内存对象, 细粒度并行, 异构系统

Abstract: The real-time performance of Scale Invariant Feature Transform (SIFT) algorithm is excessively bad. To solve the problem, a parallel optimized SIFT algorithm using the Open Computing Language (OpenCL) was proposed. Firstly, all steps of the original algorithm were split and combined; in addition, the indexing method of feature points in memory was restructured. Thus the middle calculation results could be made completely to finish interaction in the memory. Then, each step of the original algorithm was designed in parallel to improve the efficiency of data reading and reduce the transmission delay by multiplexing global memory object, sharing local memory and optimizing memory access. Finally, a fine-grained parallel accelerated SIFT algorithm was completed on Graphics Processing Unit (GPU) platform using OpenCL and the transplant was completed on the Central Processing Unit (CPU) platform. The parallel algorithm speeded up 10.51-19.33 and 2.34-4.74 times in feature extraction on GPU and CPU platform when the registration result was close to the original algorithm. The experimental results show that the parallel accelerated SIFT algorithm using OpenCL can improve the real-time performance of image registration and overcome the disadvantages of that Compute Unified Device Architecture (CUDA) is difficult to be transplanted so that it can not make full use of the multiple computing cores in heterogeneous systems.

Key words: Scale Invariant Feature Transform (SIFT) algorithm, Open Computing Language (OpenCL), multiplexed memory object, fine-grained parallelism, heterogeneous system

中图分类号: