《计算机应用》唯一官方网站 ›› 2026, Vol. 46 ›› Issue (4): 1218-1226.DOI: 10.11772/j.issn.1001-9081.2025050535

• 先进计算 • 上一篇    

基于混合精度策略的LDLT矩阵分解FPGA加速器设计

麦超云, 柯晓鹏, 钟东洲(), 洪晓纯, 陈潘荣, 苏志远   

  1. 五邑大学 电子与信息工程学院,广东 江门 529020
  • 收稿日期:2025-05-14 修回日期:2025-09-16 接受日期:2025-09-29 发布日期:2025-10-16 出版日期:2026-04-10
  • 通讯作者: 钟东洲
  • 作者简介:麦超云(1989—),男,广东江门人,副教授,博士,CCF会员,主要研究方向:智能信息处理、数字信号处理
    柯晓鹏(2000—),男,广东汕头人,硕士研究生,CCF会员,主要研究方向:FPGA、数字信号处理
    洪晓纯(1999—),女,广东潮州人,硕士研究生,CCF会员,主要研究方向:智能信息处理、人工智能
    陈潘荣(2001—),男,广东广州人,硕士研究生,CCF会员,主要研究方向:FPGA、数字信号处理
    苏志远(1998—),男,广东湛江人,硕士研究生,CCF会员,主要研究方向:智能信息处理、人工智能。
  • 基金资助:
    广东省普通高校重点领域专项(新一代通信技术)(2020ZDZX3052);江门市2022年度省科技创新战略专项(江科[2023]72号)

Design of LDLT matrix decomposition FPGA accelerator based on mixed precision strategy

Chaoyun MAI, Xiaopeng KE, Dongzhou ZHONG(), Xiaochun HONG, Panrong CHEN, Zhiyuan SU   

  1. School of Electronics and Information Engineering,Wuyi University,Jiangmen Guangdong 529020,China
  • Received:2025-05-14 Revised:2025-09-16 Accepted:2025-09-29 Online:2025-10-16 Published:2026-04-10
  • Contact: Dongzhou ZHONG
  • About author:MAI Chaoyun, born in 1989, Ph. D., associate professor. His research interests include intelligent information processing, digital signal processing.
    KE Xiaopeng, born in 2000, M. S. candidate. His research interests include FPGA, digital signal processing.
    HONG Xiaochun, born in 1999, M. S. candidate. Her research interests include intelligent information processing, artificial intelligence.
    CHEN Panrong, born in 2001, M. S. candidate. His research interests include FPGA, digital signal processing.
    SU Zhiyuan, born in 1998, M. S. candidate. His research interests include intelligent information processing, artificial intelligence.
  • Supported by:
    Special Project in Key Fields of Guangdong Universities (New Generation of Communication Technology)(2020ZDZX3052);Jiangmen City 2022 Provincial Science and Technology Innovation Strategic Special Project (Jiang ke [2023] No.72)

摘要:

针对对称正定矩阵分解算法在现场可编程门阵列(FPGA)上实现时常面临资源消耗大、计算精度与效率难以兼顾等问题,提出一种基于混合精度策略的LDLT分解加速结构。该结构在存储层面采用半精度数降低资源消耗,在计算层面使用单精度数保障计算精度与数值稳定性。此外,构建多处理单元的并行流水结构,并引入双仲裁机制,以优化数据调度与内存访问过程。加速结构则部署于xczu4ev-sfvc784 FPGA平台上,并在4PE、8PE和16PE这3种并行配置下对4~256阶的对称正定矩阵进行实验。结果显示,所提结构的矩阵分解的计算结果相对误差均在10-3以内,与部分对比方法相比,该结构占用的LUTs资源减少了40%以上,而占用的DSP资源降低了70%。可见,该结构在保持计算精度的同时实现了低硬件开销,提升了吞吐量,具备良好的可扩展性和工程适应性。

关键词: LDLT分解, 混合精度策略, 双仲裁机制, 并行流水, 现场可编程门阵列

Abstract:

In view of the problems of high resource consumption and difficulty in balancing computational accuracy and efficiency when implementing symmetric positive definite matrix decomposition algorithms on Field Programmable Gate Array (FPGA), an LDLT decomposition acceleration structure based on mixed precision strategy was proposed. In the structure, half-precision numbers were used at the storage level to reduce resource consumption, and single-precision numbers were used at the computational level to ensure computational accuracy and numerical stability. In addition, a parallel pipeline structure of multiple processing units was constructed, and a dual arbitration mechanism was introduced, so as to optimize data scheduling and memory access processes. The acceleration structure was deployed on the xczu4ev-sfvc784 FPGA platform, and experiments were conducted on symmetric positive definite matrices of order 4 to 256 under three parallel configurations of 4PE, 8PE, and 16PE. The results show that the proposed structure has the relative errors of calculation results of the matrix decomposition all within 10-3. Compared with some contrast methods, this structure reduces the occupied LUTs resources by more than 40%, and the occupied DSP resources by 70%. It can be seen that this structure maintains computational accuracy while achieving low hardware overhead and improving throughput, demonstrating excellent scalability and engineering adaptability.

Key words: LDLT decomposition, mixed precision strategy, dual arbitration mechanism, parallel pipeline, Field Programmable Gate Array (FPGA)

中图分类号: