Journal of Computer Applications ›› 2026, Vol. 46 ›› Issue (4): 1218-1226.DOI: 10.11772/j.issn.1001-9081.2025050535
• Advanced computing • Previous Articles
Chaoyun MAI, Xiaopeng KE, Dongzhou ZHONG(
), Xiaochun HONG, Panrong CHEN, Zhiyuan SU
Received:2025-05-14
Revised:2025-09-16
Accepted:2025-09-29
Online:2025-10-16
Published:2026-04-10
Contact:
Dongzhou ZHONG
About author:MAI Chaoyun, born in 1989, Ph. D., associate professor. His research interests include intelligent information processing, digital signal processing.Supported by:
麦超云, 柯晓鹏, 钟东洲(
), 洪晓纯, 陈潘荣, 苏志远
通讯作者:
钟东洲
作者简介:麦超云(1989—),男,广东江门人,副教授,博士,CCF会员,主要研究方向:智能信息处理、数字信号处理基金资助:CLC Number:
Chaoyun MAI, Xiaopeng KE, Dongzhou ZHONG, Xiaochun HONG, Panrong CHEN, Zhiyuan SU. Design of LDLT matrix decomposition FPGA accelerator based on mixed precision strategy[J]. Journal of Computer Applications, 2026, 46(4): 1218-1226.
麦超云, 柯晓鹏, 钟东洲, 洪晓纯, 陈潘荣, 苏志远. 基于混合精度策略的LDLT矩阵分解FPGA加速器设计[J]. 《计算机应用》唯一官方网站, 2026, 46(4): 1218-1226.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2025050535
| 矩阵阶数 | PE数 | 相对误差/% | 相对误差/% | 计算时间/ms |
|---|---|---|---|---|
| 4 | 4 | 0.010 7 | 0.011 5 | 0.001 65 |
| 8 | 0.001 65 | |||
| 16 | 0.001 65 | |||
| 8 | 4 | 0.043 1 | 0.077 6 | 0.005 93 |
| 8 | 0.006 02 | |||
| 16 | 0.006 02 | |||
| 16 | 4 | 0.032 8 | 0.042 5 | 0.030 84 |
| 8 | 0.030 84 | |||
| 16 | 0.029 11 | |||
| 32 | 4 | 0.061 7 | 0.094 5 | 0.173 64 |
| 8 | 0.171 51 | |||
| 16 | 0.171 51 | |||
| 64 | 4 | 0.037 3 | 0.065 2 | 1.133 69 |
| 8 | 1.125 79 | |||
| 16 | 1.121 28 | |||
| 128 | 4 | 0.032 3 | 0.042 2 | 8.522 45 |
| 8 | 8.147 79 | |||
| 16 | 7.975 71 | |||
| 256 | 4 | 0.031 7 | 0.057 0 | 68.473 25 |
| 8 | 62.784 58 | |||
| 16 | 59.879 47 |
Tab. 1 Test results under different PE configurations
| 矩阵阶数 | PE数 | 相对误差/% | 相对误差/% | 计算时间/ms |
|---|---|---|---|---|
| 4 | 4 | 0.010 7 | 0.011 5 | 0.001 65 |
| 8 | 0.001 65 | |||
| 16 | 0.001 65 | |||
| 8 | 4 | 0.043 1 | 0.077 6 | 0.005 93 |
| 8 | 0.006 02 | |||
| 16 | 0.006 02 | |||
| 16 | 4 | 0.032 8 | 0.042 5 | 0.030 84 |
| 8 | 0.030 84 | |||
| 16 | 0.029 11 | |||
| 32 | 4 | 0.061 7 | 0.094 5 | 0.173 64 |
| 8 | 0.171 51 | |||
| 16 | 0.171 51 | |||
| 64 | 4 | 0.037 3 | 0.065 2 | 1.133 69 |
| 8 | 1.125 79 | |||
| 16 | 1.121 28 | |||
| 128 | 4 | 0.032 3 | 0.042 2 | 8.522 45 |
| 8 | 8.147 79 | |||
| 16 | 7.975 71 | |||
| 256 | 4 | 0.031 7 | 0.057 0 | 68.473 25 |
| 8 | 62.784 58 | |||
| 16 | 59.879 47 |
| 方法 | LUTs | FF | BRAM | DSP | Frequency/MHz | WNS/ns | Throughput | |
|---|---|---|---|---|---|---|---|---|
| 文献[ | 59 055 | 134 878 | 189 | 1 530 | 250 | 0.044 | 52.60 | |
| 文献[ | 65 158 | 95 717 | 16 | 411 | 700 | — | — | |
| 文献[ | 36 402 | — | 128 | 116 | 200 | — | 105.20 | |
| 本文方法 | 4PE | 11 654 | 9 261 | 6 | 36 | 100 | 0.853 | 117.34 |
| 8PE | 19 671 | 11 763 | 8 | 64 | 100 | 0.703 | 122.73 | |
| 16PE | 35 509 | 16 782 | 12 | 120 | 100 | 1.031 | 125.38 | |
Tab. 2 Comparison of FPGA hardware resource consumption and efficiency
| 方法 | LUTs | FF | BRAM | DSP | Frequency/MHz | WNS/ns | Throughput | |
|---|---|---|---|---|---|---|---|---|
| 文献[ | 59 055 | 134 878 | 189 | 1 530 | 250 | 0.044 | 52.60 | |
| 文献[ | 65 158 | 95 717 | 16 | 411 | 700 | — | — | |
| 文献[ | 36 402 | — | 128 | 116 | 200 | — | 105.20 | |
| 本文方法 | 4PE | 11 654 | 9 261 | 6 | 36 | 100 | 0.853 | 117.34 |
| 8PE | 19 671 | 11 763 | 8 | 64 | 100 | 0.703 | 122.73 | |
| 16PE | 35 509 | 16 782 | 12 | 120 | 100 | 1.031 | 125.38 | |
| [1] | 冯达,周福才,吴淇毓,等. 基于LU分解的安全外包求解线性代数方程组方法[J]. 东北大学学报(自然科学版), 2024, 45(4): 457-463, 506. |
| FENG D, ZHOU F C, WU Q Y, et al. Secure outsourcing method for solving linear algebraic equations based on LU decomposition[J]. Journal of Northeastern University (Natural Science), 2024, 45(4): 457-463, 506. | |
| [2] | 笪涵,胡圣波. 基于Cholesky矩阵分解的贝叶斯压缩感知信号处理[J]. 贵州师范大学学报(自然科学版), 2021, 39(1): 72-76. |
| DA H, HU S B. Bayesian compressed sensing signal processing based on Cholesky matrix decomposition[J]. Journal of Guizhou Normal University (Natural Sciences), 2021, 39(1): 72-76. | |
| [3] | 鲍长春,白志刚. 基于非负矩阵分解的语音增强方法综述[J]. 信号处理, 2020, 36(6): 791-803. |
| BAO C C, BAI Z G. Speech enhancement based on nonnegative matrix factorization: an overview[J]. Journal of Signal Processing, 2020, 36(6): 791-803. | |
| [4] | 史加荣,李金红. 新型深度矩阵分解及其在推荐系统中的应用[J]. 西安电子科技大学学报, 2022, 49(3): 171-182. |
| SHI J R, LI J H. Novel deep matrix factorization and its application in the recommendation system[J]. Journal of Xidian University, 2022, 49(3): 171-182. | |
| [5] | DU K L, SWAMY M N S, WANG Z Q, et al. Matrix factorization techniques in machine learning, signal processing, and statistics[J]. Mathematics, 2023, 11(12): No.2674. |
| [6] | YANG B. Application of matrix decomposition in machine learning[C]// Proceedings of the 2021 IEEE International Conference on Computer Science, Electronic Information Engineering and Intelligent Control Technology. Piscataway: IEEE, 2021: 133-137. |
| [7] | 杨羽彤,易琦,孙正宝,等. 矩阵分解方法应用研究进展[J]. 云南大学学报(自然科学版), 2025, 47(4): 635-647. |
| YANG Y T, YI Q, SUN Z B, et al. Review of matrix decomposition methods application[J]. Journal of Yunnan University (Natural Sciences Edition), 2025, 47(4): 635-647. | |
| [8] | 杨露,张得礼,王苑,等. 基于FPGA的阵列雷达矩阵算法研究[J]. 现代雷达, 2023, 45(7): 1-8. |
| YANG L, ZHANG D L, WANG Y, et al. A study on array radar matrix algorithm based on FPGA[J]. Modern Radar, 2023, 45(7): 1-8. | |
| [9] | JAISWAL M K, CHANDRACHOODAN N. FPGA-based high-performance and scalable block LU decomposition architecture[J]. IEEE Transactions on Computers, 2012, 61(1): 60-72. |
| [10] | YEN P T, THANH T M, MINH N H. Novel blind colour image watermarking technique using Cholesky decomposition[C]// Proceedings of the 1st International Conference on Cryptography and Information Security. Piscataway: IEEE, 2024: 1-6. |
| [11] | OSINSKY A, BYCHKOV R, TREFILOV M, et al. Regularization for Cholesky decomposition in massive MIMO detection[J]. IEEE Wireless Communications Letters, 2023, 12(9): 1603-1607. |
| [12] | ALI M, CHOUDHARY J. Investigation of neural network parameters for MNIST using QR decomposition algorithm and principal component analysis[C]// Proceedings of the 2nd International Conference on Computer, Communication and Control. Piscataway: IEEE, 2024: 1-7. |
| [13] | 陈文杰,宋宇鲲,张多利. 基于改进QR算法的矩阵分解器设计[J].电子科技, 2022, 35(11): 21-28. |
| CHEN W J, SONG Y K, ZHANG D L. Design of matrix decomposer based on improved QR algorithm[J]. Electronic Science and Technology, 2022, 35(11): 21-28. | |
| [14] | RAJ M S S, GEORGE S N. A fast and efficient approach for Human action recovery from corrupted 3-D motion capture data using QR decomposition-based approximate SVD[J]. IEEE Transactions on Human-Machine Systems, 2024, 54(4): 395-405. |
| [15] | 陈鑫峰,王武. 稀疏对称矩阵的LDLT分解在GPU上的高效实现[J]. 数据与计算发展前沿, 2021, 3(3): 136-147. |
| CHEN X F, WANG W. An effective implementation of LDLT decomposition of sparse symmetric matrix on GPU[J]. Frontiers of Data and Computing, 2021, 3(3): 136-147. | |
| [16] | 安国臣,刘若凡,赵满,等. 基于现场可编程门阵列的矩阵求逆算法设计[J]. 科学技术与工程, 2024, 24(10): 4140-4147. |
| AN G C, LIU R F, ZHAO M, et al. Design of matrix inversion algorithm based on field programmable gate array[J]. Science Technology and Engineering, 2024, 24(10): 4140-4147. | |
| [17] | 李丽,张巍. 改进Cholesky分解算法的设计与FPGA实现[J]. 电讯技术, 2020, 60(7): 845-849. |
| LI L, ZHANG W. Design and FPGA implementation of an improved Cholesky factorization algorithm[J]. Telecommunication Engineering, 2020, 60(7): 845-849. | |
| [18] | 邱俊豪,宋宇鲲,陈文杰,等. 64位双精度矩阵分解的优化和硬件实现[J]. 合肥工业大学学报(自然科学版), 2021, 44(12): 1640-1645. |
| QIU J H, SONG Y K, CHEN W J, et al. Optimization and hardware implementation of 64-bit double-precision matrix decomposition[J]. Journal of Hefei University of Technology (Natural Science), 2021, 44(12): 1640-1645. | |
| [19] | 朱鹏,叶树霞,杨晓飞. 基于浮点数的Cholesky分解FPGA实现[J].计算机与数字工程, 2023, 51(4):759-762, 831. |
| ZHU P, YE S X, YANG X F. FPGA implementation of Cholesky decomposition based on floating point number[J]. Computer and Digital Engineering, 2023, 51(4): 759-762, 831. | |
| [20] | 余浩然,肖昊. 基于LDL算法的大规模矩阵求逆加速器设计及其FPGA实现[J]. 电子科技, 2023, 36(7): 1-7. |
| YU H R, XIAO H. Design and FPGA implementation of large scale matrix inversion accelerator based on LDL algorithm[J]. Electronic Science and Technology, 2023, 36(7): 1-7. |
| [1] | Hailin XIAO, Yudong YANG, Ziyi YANG, Hailong LIU, Yu WANG, Zhongshan ZHANG, Xiaoming DAI. Design and implementation of FPGA hardware structure optimization based on R22FFT algorithm [J]. Journal of Computer Applications, 2025, 45(8): 2637-2645. |
| [2] | Yingjie MA, Jing XIAO, Geng ZHAO, Ping ZENG, Yatao YANG. Controllable grid multi-scroll chaotic system family and its hardware circuit implementation [J]. Journal of Computer Applications, 2023, 43(3): 956-961. |
| [3] | Binwei SONG, Yao WANG. Low-cost pay-per-use licensing scheme for FPGA intellectual property protection [J]. Journal of Computer Applications, 2023, 43(10): 3142-3148. |
| [4] | HUANG Chengcheng, DONG Xiaoxiao, LI Zhao. Deep pipeline 5×5 convolution method based on two-dimensional Winograd algorithm [J]. Journal of Computer Applications, 2021, 41(8): 2258-2264. |
| [5] | WANG Xiaofeng, JIANG Penglong, ZHOU Hui, ZHAO Xiongbo. Design of FPGA accelerator with high parallelism for convolution neural network [J]. Journal of Computer Applications, 2021, 41(3): 812-819. |
| [6] | XU Yingxin, SUN Lei, ZHAO Jiancheng, GUO Songhui. Virtual field programmable gate array placement strategy based on ant colony optimization algorithm [J]. Journal of Computer Applications, 2020, 40(3): 747-752. |
| [7] | LEI Xiaokang, YIN Zhigang, ZHAO Ruilian. FPGA-based convolutional neural network fixed-point acceleration [J]. Journal of Computer Applications, 2020, 40(10): 2811-2816. |
| [8] | XUE Jun, DUAN Fajie, JIANG Jiajia, LI Yanchao, YUAN Jianfu, WANG Xianquan. Parallel cyclic redundancy check Verilog program generating method based on Matlab [J]. Journal of Computer Applications, 2016, 36(9): 2503-2507. |
| [9] | XIN Xiaoxia, WANG Yi, LI Renfa. FPGA-based implementation for fault detection of SMS4 [J]. Journal of Computer Applications, 2015, 35(2): 420-423. |
| [10] | LI Kai HE Songhua OU Jianping. Research and application of matching communications between Virtex-5 GTP and Virtex-6 GTX [J]. Journal of Computer Applications, 2014, 34(2): 325-328. |
| [11] | LIN Shiyao WU Chongyang LI Ruifeng. Embedded motion controller design based on RTEX network [J]. Journal of Computer Applications, 2013, 33(12): 3604-3607. |
| [12] | ZHANG Wenkai GUAN Guixia ZHAO Haimeng WANG Zhiming WU Taixia YAN Lei. Design of multi-serial communication for micro-satellite simulation system [J]. Journal of Computer Applications, 2013, 33(12): 3477-3481. |
| [13] | JU Xiaoming ZHANG Jiehao ZHANG Yizhong. Real-time error detection techniques based on FPGA [J]. Journal of Computer Applications, 2013, 33(05): 1459-1462. |
| [14] | ZHANG Xiaoliang ZHU Qing WANG Yaonan CAO Shiwei. Implementation of soft start on design of body control system [J]. Journal of Computer Applications, 2013, 33(04): 1187-1190. |
| [15] | XIE Huimin GUO Donghui. Reconfigurable serial AES encryption and decryption circuit design [J]. Journal of Computer Applications, 2013, 33(02): 450-459. |
| Viewed | ||||||
|
Full text |
|
|||||
|
Abstract |
|
|||||