M-DSP中高性能浮点乘加器的设计与实现

doi:10.11772/j.issn.1001-9081.2016.08.2213

计算机应用 ›› 2016, Vol. 36 ›› Issue (8): 2213-2218.DOI: 10.11772/j.issn.1001-9081.2016.08.2213

M-DSP中高性能浮点乘加器的设计与实现

车文博, 刘衡竹, 田甜

国防科学技术大学计算机学院, 长沙 410073

收稿日期:2016-01-15 修回日期:2016-03-12 出版日期:2016-08-10 发布日期:2016-08-10
通讯作者: 田甜
作者简介:车文博(1981-),男,山东武城人,硕士研究生,主要研究方向:微处理器设计;刘衡竹(1963-),男,湖南怀化人,教授,博士,主要研究方向:微处理器设计、计算机体系结构;田甜(1983-),男,湖南澧县人,硕士,主要研究方向:微处理器设计。
基金资助:
航天科学基金资助项目（2013ZC88003）。

Design and implementation of high performance floating-point multiply acculate for M-DSP

CHE Wenbo, LIU Hengzhu, TIAN Tian

College of Computer, National University of Defense Technology, Changsha Hunan 410073, China

Received:2016-01-15 Revised:2016-03-12 Online:2016-08-10 Published:2016-08-10
Supported by:
This work is partially supported by the the Aerospace Science Foundation of China (2013ZC88003).

摘要/Abstract

摘要： 针对高性能M型数字信号处理器（M-DSP）对浮点运算的性能、面积和功耗要求，研究分析了M-DSP总体结构和浮点运算的指令特点，设计和实现了一种高性能低功耗的浮点乘累加器（FMAC）。该乘加器采用单、双精度通路分离的主体结构，分为六级流水站执行，对乘法器、对阶移位等关键模块进行了复用设计，支持双精度和单精度浮点乘法、乘累加、乘累减、单精度点积和复数运算。对所设计的乘加器进行了全面的验证，基于45nm工艺采用Synopsys公司的Design Compiler工具综合所设计的代码，综合结果表明运行频率可达1GHz，单元面积36856μm²；与FT-XDSP中的乘加器相比，面积节省了12.95%，关键路径长度减少了2.17%。

关键词: 浮点乘法, 浮点乘累加器, 浮点点积, 布斯算法, IEEE

Abstract: In order to meet the requirements on performance, power, area of floating-point computing in M-DSP, the architecture of a M-DSP, as well as the characteristics of all the instructions related to its floating-point computing were analyzed, and a Floating-point Multiply ACcumulate (FMAC) with high performance and low power was proposed. The proposed FMAC has structure with separated single and double precision path, which was divided into 6-stage pipelines; its key modules including multiplier and shift device were designed for reuse, and the operations including single and double precision floating-point multiplication, multiply-add and multiply-sub, floating-point complex multiplication, dot product, etc. were all implemented in it. The proposed FMAC was fully verified and synthesized by using Design Compiler with 45nm technique of Synopsys Company. Experimental results show that the frequency of the proposed FMAC is up to 1GHz, the area is 36856μm²; compared with the FMAC of FT-XDSP, the area is saved by 12.95%, and the critical path was shortened by 2.17%.

Key words: floating-point multiplier, Floating-point Multiply ACcumulate(FMAC), floating-point dot product, Booth algorithm, IEEE

中图分类号:

TP332.2

车文博, 刘衡竹, 田甜. M-DSP中高性能浮点乘加器的设计与实现[J]. 计算机应用, 2016, 36(8): 2213-2218.

CHE Wenbo, LIU Hengzhu, TIAN Tian. Design and implementation of high performance floating-point multiply acculate for M-DSP[J]. Journal of Computer Applications, 2016, 36(8): 2213-2218.

参考文献

[1] 李海森,李思纯,周天.高速DSP原理、应用与试验教程[M].北京:清华大学出版社,2009:23-47.(LI H S,LI S C,ZHOU T.High Speed DSP Principle,Application and Experiment Course[M].Beijing:Tsinghua University Press,2009:23-47.)
[2] 方维,孙广中,吴超,等.一种三维快速傅里叶变换并行算法[J].计算机研究与发展,2011,48(3):440-446.(FANG W,SUN G Z,WU C,et al.A parallel algorithm of three-dimensional fast Fourier transform[J].Journal of Computer Research and Development,2011,48(3):440-446.)
[3] WANG X,ZHANG Y,WANG F,et al.A configurable floating-point discrete Hilbert transform processor for accelerating the calculation of filter in Katsevich formula[J].WSEAS Transactions on Communications,2012,11(11):395-404.
[4] 张拥军,陈艇.基于软件无线电的并行多输入多输出均衡技术[J].计算机应用,2015,35(4):1179-1184.(ZHANG Y J,CHEN T.Parallel multiple input and multiple output equalization based on software defined radio[J].Journal of Computer Applications,2015,35(4):1179-1184.)
[5] MONTOYE R K,HOKENEK E,RUNYON S L.Design of the IBM RISC System/6000 floating-point execution unit[J].IBM Journal of Research and Development,1990,34(1):59-70.
[6] LANG T,BRUGUERA J D.Floating-point fused multiply-add with reduced latency[J].IEEE Transactions on Computers,2004,53(8):988-1003.
[7] LANG T,BRUGUERA J D.Floating-point fused multiply-add:reduced latency for floating-point addition[C]//ARITH'05:Proceedings of the 17th IEEE Symposium on Computer Arithmetic.Washington,DC:IEEE Computer Society,2005:42-51.
[8] 田甜.FT-XDSP中FMAC单元的研究与实现[D].长沙.国防科学技术大学,2013:56-57.(TIAN T.The research and implementation of high performance SIMD floating-point multiplication accumulator unit for FT-XDSP[D].Changsha:National University of Defence Technology,2013:56-57.)
[9] 彭元喜,杨洪杰,谢刚.X-DSP浮点乘法器的设计与实现[J].计算机应用,2010,30(11):3121-3126.(PENG Y X,YANG H J,XIE G.Design and implementation of float point multiplier in X-DSP[J].Journal of Computer Applications,2010,30(11):3121-3126.)
[10] 何军,黄永勤,朱英.分离通路浮点乘加器设计与实现[J].计算机科学,2013,40(8):28-33.(HE J,HUANG Y Q,ZHU Y.Design and implementation of separated path floating-point fused multiply-add unit[J].Computer Science,2013,40(8):28-33.)
[11] RUBINFIELD L P.A proof of the modified Booth's algorithm for multiplication[J].IEEE Transactions on Computer,1975,24(10):1014-1015.
[12] 陈海燕,郭阳,刘祥远,等.集成电路计算机辅助设计与验证实践[M].长沙:国防科技大学出版社,2010:210-220.(CHEN H Y,GUO Y,LIU X Y,et al.The Practice of Computer Aided Design and Verification of Integrated Circuits[M].Changsha:National University of Defence Technology Press,2010:210-220.)

[1]	赵飞飞, 金彦亮, 熊勇. 基于Wi-Fi设备的区域人员密度检测概率研究[J]. 计算机应用, 2016, 36(6): 1751-1756.
[2]	任智索建伟刘砚雷宏江. 高效低时延的LR-WPAN Mesh地址分配算法[J]. 计算机应用, 2014, 34(1): 1-3.
[3]	乔冠华毛剑琳郭宁陈波戴宁张传龙. IEEE 802.15.4 MAC协议退避机制的改进[J]. 计算机应用, 2013, 33(10): 2723-2725.
[4]	黄亮王福豹马超杨晗. 随机接入移动无线传感器网络快速组网媒体访问控制[J]. 计算机应用, 2013, 33(10): 2726-2729.
[5]	李丹葛志辉. 基于功率控制和冲突避免的无线Mesh网络低能耗MAC协议[J]. 计算机应用, 2013, 33(04): 912-915.
[6]	叶廷东黄国健洪晓斌. 基于IEEE1451的智能监控系统数据交换技术研究[J]. 计算机应用, 2013, 33(04): 1183-1186.
[7]	蔡惠娟蒋文贤. IEEE802.15.4多时隙下GTS性能分析及配置优化[J]. 计算机应用, 2012, 32(12): 3499-3504.
[8]	崔媛媛徐荣青潘欣艳王玉杰关丽王斌斌. 基于IEEE802.16e协议的无短环的LDPC缩短码设计[J]. 计算机应用, 2011, 31(12): 3207-3209.
[9]	周超周城郭亮. IEEE 802.1X的安全性分析及改进[J]. 计算机应用, 2011, 31(05): 1265-1270.
[10]	万征. IEEE 802.11e动态队列分派算法[J]. 计算机应用, 2010, 30(8): 2207-2209.
[11]	彭元喜杨洪杰谢刚. X-DSP浮点乘法器的设计与实现[J]. 计算机应用, 2010, 30(11): 3121-3125.
[12]	孟曼刘宴兵. WLAN中基于混合模式的接纳控制算法[J]. 计算机应用, 2010, 30(06): 1451-1454.
[13]	林一多高德云梁露露张思东. 基于ARM的无线传感器网络MAC协议设计与实现[J]. 计算机应用, 2010, 30(05): 1145-1148.
[14]	毛建兵毛玉明冷甦鹏白翔. 支持多速率自适应的IEEE 802.11网络性能分析[J]. 计算机应用, 2009, 29(10): 2638-2643.
[15]	毛建兵毛玉明冷甦鹏. IEEE 802.11 EDCA带宽分配控制的竞争窗口优化[J]. 计算机应用, 2009, 29(1): 1-4,8.

M-DSP中高性能浮点乘加器的设计与实现

Design and implementation of high performance floating-point multiply acculate for M-DSP

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics