Journal of Computer Applications ›› 2022, Vol. 42 ›› Issue (12): 3841-3846.DOI: 10.11772/j.issn.1001-9081.2021101726

• Advanced computing • Previous Articles    

Low density parity check code decoding acceleration technology based on GPU

Qidi XU, Zhenghong LIU(), Lin ZHENG   

  1. Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing (Guilin University of Electronic Technology),Guilin Guangxi 541004,China
  • Received:2021-10-09 Revised:2021-12-26 Accepted:2022-01-05 Online:2022-01-24 Published:2022-12-10
  • Contact: Zhenghong LIU
  • About author:XU Qidi, born in 1997, M. S. candidate. His research interests include Graphics Processing Unit (GPU) parallel computing, software radio applications.
    ZHENG Lin, born in 1973, Ph. D., professor. His research interests include wireless communication signal processing, incoherent Multiple-Input Multiple-Output (MIMO), Ultra Wideband (UWB) communication, radar-communication integration.
  • Supported by:
    Natural Science Foundation of Guangxi(2020GXNSFAA159067);Fund for Guangxi Key Laboratory of Wireless Wideband Communication and Signal Processing(GXKL06160112);Fund for Key Laboratory of Cognitive Radio(CRKL200102)

基于GPU的低密度奇偶校验码译码加速技术

徐启迪, 刘争红(), 郑霖   

  1. 广西无线宽带通信与信号处理重点实验室(桂林电子科技大学),广西 桂林 541004
  • 通讯作者: 刘争红
  • 作者简介:徐启迪(1997—),男,广东汕尾人,硕士研究生,主要研究方向:图形处理器(GPU)并行运算、软件无线电应用
    郑霖(1973—),男,安徽祁门人 ,教授,博士,主要研究方向:无线通信信号处理、非相干多进多出(MIMO)、超宽带(UWB)通信、雷达通信一体化。
  • 基金资助:
    广西自然科学基金资助项目(2020GXNSFAA159067);无线宽带通信与信号处理重点实验室基金资助项目(GXKL06160112);认知无线电重点实验室项目(CRKL200102)

Abstract:

With the development of communication technology, communication terminals gradually adopt software to be compatible with multiple communication modes and protocols. As in the traditional software radio architecture with a Central Processing Unit (CPU) of computer as an arithmetic unit, the wideband data throughput of high-speed wireless communication systems such as Multiple-Input Multiple-Output (MIMO) is not be satisfied, an acceleration method of Low Density Parity Check (LDPC) code decoder based on Graphics Processing Unit (GPU) was proposed. Firstly, according to the theoretical analysis of the acceleration performance of GPU parallelly accelerated heterogeneous computing in GNU Radio 4G/5G physical layer signal processing module, a more parallelly efficient Layered Normalized Min-Sum (LNMS) algorithm was adopted. Then, the decoding delay of the decoder was reduced by using the methods such as global synchronization strategy, reasonably allocation of GPU memory space and stream parallelism mechanism. At the same time, the LDPC code decoding process was optimized in parallel with the multi-threaded parallel technology in GPU. Finally, the GPU accelerated decoder was implemented and verified on the software radio platform, and the bit error rate performance and acceleration performance bottlenecks of the parallel decoder were analyzed. Experimental results show that compared with the traditional CPU serial code processing method, CPU+GPU heterogeneous platform has the decoding rate for LDPC codes increased to about 200 times, and the throughput of decoder can reach more than 1 Gb/s, especially in the case of large-scale data, the decoding performance is greatly improved compared with traditional decoder.

Key words: Graphic Processing Unit (GPU), Compute Unified Device Architecture (CUDA), Low Density Parity Check (LDPC) code, parallel computing, channel decoding

摘要:

随着通信技术的发展,通信终端逐渐采用软件的方式来兼容多种通信制式和协议。针对以计算机中央处理器(CPU)作为运算单元的传统软件无线电架构,无法满足高速无线通信系统如多进多出(MIMO)等宽带数据的吞吐率要求问题,提出了一种基于图形处理器(GPU)的低密度奇偶校验(LDPC)码译码器的加速方法。首先,根据GPU并行加速异构计算在GNU Radio 4G/5G物理层信号处理模块中的加速表现的理论分析,采用了并行效率更高的分层归一化最小和(LNMS)算法;其次,通过使用全局同步策略、合理分配GPU内存空间以及流并行机制等方法减少了译码器的译码时延,同时配合GPU多线程并行技术对LDPC码的译码流程进行了并行优化;最后,在软件无线电平台上对提出的GPU加速译码器进行了实现与验证,并分析了该并行译码器的误码率性能和加速性能的瓶颈。实验结果表明,与传统的CPU串行码处理方式相比,CPU+GPU异构平台对LDPC码的译码速率可提升至原来的200倍左右,译码器的吞吐量可以达到1 Gb/s以上,特别是在大规模数据的情况下对传统译码器的译码性有着较大的提升。

关键词: 图形处理器, 计算统一设备架构, 低密度奇偶校验码, 并行计算, 信道译码

CLC Number: