计算机应用 ›› 2010, Vol. 30 ›› Issue (12): 3252-3254.

• 图形图像处理 • 上一篇    下一篇

CUDA架构下H.264快速去块滤波算法

刘虎1,孙召敏2,陈启美2   

  1. 1. 南京大学
    2.
  • 收稿日期:2010-05-20 修回日期:2010-07-09 发布日期:2010-12-22 出版日期:2010-12-01
  • 通讯作者: 刘虎
  • 基金资助:
    江苏省重大高科技研究项目

Algorithm of H.264 fast deblocking filter on CUDA

  • Received:2010-05-20 Revised:2010-07-09 Online:2010-12-22 Published:2010-12-01

摘要: 针对H.264/AVC视频编码标准中去块滤波器运算复杂度高、耗时巨大这一难题,提出了一种基于NVIDIA计算统一设备架构(CUDA)平台的H.264并行快速去块滤波算法,介绍了CUDA平台硬件结构特点与软件开发流程,根据图形处理器(GPU)的并发结构特点,对BS判定与滤波计算进行了并行优化,降低了算法复杂度,利用共享内存提高了数据访问速率,实现了去块滤波器的并行处理。实验结果表明,在图像质量基本不变的情况下,GPU算法能够明显提高运算速度,平均加速比在20倍左右,取得了良好的效果。

关键词: 计算统一设备架构, H.264, 去块滤波, 并行计算

Abstract: In H.264/AVC video coding standard, deblocking filter was used for enhancing the coding efficiency. The filter was very complicated and cost a lot of time. A fast algorithm and efficient implementation of H.264 deblocking filter based on NVIDIA Compute Unified Device Architecture (CUDA) was proposed. The parallel hardware architecture and software development process of Graphic Processing Unit (GPU) were introduced firstly. On the basis of the parallel architecture and hardware characteristic of GPU, some algorithms were focused on BS computation and optimization of deblocking filter to reduce complexity and improve the computing speed, and the shared memory was used to improve the data access efficiency. The experimental results clearly show that, in the same image quality, the average acceleration rate is about 20, and the algorithm on CPU can achieve better performance.

Key words: Compute Unified Device Architecture (CUDA), H.264, deblocking filter, parallel computing