Journal of Computer Applications ›› 2010, Vol. 30 ›› Issue (12): 3252-3254.
• Graphics and image processing • Previous Articles Next Articles
Received:
Revised:
Online:
Published:
刘虎1,孙召敏2,陈启美2
通讯作者:
基金资助:
Abstract: In H.264/AVC video coding standard, deblocking filter was used for enhancing the coding efficiency. The filter was very complicated and cost a lot of time. A fast algorithm and efficient implementation of H.264 deblocking filter based on NVIDIA Compute Unified Device Architecture (CUDA) was proposed. The parallel hardware architecture and software development process of Graphic Processing Unit (GPU) were introduced firstly. On the basis of the parallel architecture and hardware characteristic of GPU, some algorithms were focused on BS computation and optimization of deblocking filter to reduce complexity and improve the computing speed, and the shared memory was used to improve the data access efficiency. The experimental results clearly show that, in the same image quality, the average acceleration rate is about 20, and the algorithm on CPU can achieve better performance.
Key words: Compute Unified Device Architecture (CUDA), H.264, deblocking filter, parallel computing
摘要: 针对H.264/AVC视频编码标准中去块滤波器运算复杂度高、耗时巨大这一难题,提出了一种基于NVIDIA计算统一设备架构(CUDA)平台的H.264并行快速去块滤波算法,介绍了CUDA平台硬件结构特点与软件开发流程,根据图形处理器(GPU)的并发结构特点,对BS判定与滤波计算进行了并行优化,降低了算法复杂度,利用共享内存提高了数据访问速率,实现了去块滤波器的并行处理。实验结果表明,在图像质量基本不变的情况下,GPU算法能够明显提高运算速度,平均加速比在20倍左右,取得了良好的效果。
关键词: 计算统一设备架构, H.264, 去块滤波, 并行计算
刘虎 孙召敏 陈启美. CUDA架构下H.264快速去块滤波算法[J]. 计算机应用, 2010, 30(12): 3252-3254.
0 / Recommend
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/
https://www.joca.cn/EN/Y2010/V30/I12/3252