Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (7): 1854-1857.DOI: 10.11772/j.issn.1001-9081.2015.07.1854

Previous Articles     Next Articles

Floating point divider design of high-performance double precision based on Goldschmidt's algorithm

HE Tingting, PENG Yuanxi, LEI Yuanwu   

  1. College of Computer Science, National University of Defense Technology, Changsha Hunan 410073, China
  • Received:2015-01-28 Revised:2015-03-28 Online:2015-07-10 Published:2015-07-17

基于Goldschmidt算法的高性能双精度浮点除法器设计

何婷婷, 彭元喜, 雷元武   

  1. 国防科学技术大学 计算机学院, 长沙 410073
  • 通讯作者: 何婷婷(1991-),女,黑龙江伊春人,硕士研究生,主要研究方向:微处理器设计,heting7410@163.com
  • 作者简介:彭元喜(1966-),男,湖南常德人,研究员,博士生导师,博士,主要研究方向:微处理器设计、NoC设计、MPSoC体系结构; 雷元武(1982-),男,湖南郴州人,助理研究员,博士,主要研究方向:微处理器设计、可重构设计、计算机体系结构。
  • 基金资助:

    湖南省重点学科建设项目(434515000008);航空科学基金资助项目(2013zc88003);国家自然科学基金资助项目(61402499)。

Abstract:

Focusing on the issue that division is complex and needs a large delay to compute, a kind of method for designing the unit of high-performance double precision floating point divider based on Goldschmidt's algorithm was proposed and it supported IEEE-754 standard. Firstly, it was analyzed that how to compute division using Goldschmidt's algorithm and the error produced during iterative operation. Then, the method for controlling error was proposed. Secondly, bipartite reciprocal tables were adopted to calculate initial value of iteration with area saving, and parallel multipliers were adopted in the iterative unit for accelerating. Lastly, the executed station was divided reasonably and it made floating point divider supporting pipeline execution with state machine controlling. So, the speed of divider was improved. The experimental results show that the double precision floating point divider adopted 14-bit iterative initial value pipeline structure, its synthesis cell area is 84902.2618 μm2, the running frequency is up to 2.2 GHz with 40 nm technology. Compared with 8-bit iterative initial value pipeline structure, computing speed is increased by 32.73% and area is increased by 5.05%. The delay of a double precision floating division instruction is 12 cycles, and it is decreased to 3 cycles in pipeline execution. Compared with the divider based on SRT algorithm implemented in other processers, data throughput is improved by 3-7 times. Compared with the divider based on Goldschmidt's algorithm implemented in other processers, data throughput is improved by 2-3 times.

Key words: floating point divider, Goldschmidt's algorithm, bipartite reciprocal table, high-performance divider, Digital Signal Processing (DSP)

摘要:

针对双精度浮点除法通常运算过程复杂、延时较大这一问题,提出一种基于Goldschmidt算法设计支持IEEE-754标准的高性能双精度浮点除法器方法。首先,分析Goldschmidt算法运算除法的过程以及迭代运算产生的误差;然后,提出了控制误差的方法;其次,采用了较节约面积的双查找表法确定迭代初值,迭代单元采用并行乘法器结构以提高迭代速度;最后,合理划分流水站,控制迭代过程使浮点除法可以流水执行,从而进一步提高除法器运算速率。实验结果表明,在40 nm工艺下,双精度浮点除法器采用14位迭代初值流水结构,其综合cell面积为84902.2618 μm2,运行频率可达2.2 GHz;相比采用8位迭代初值流水结构运算速度提高了32.73%,面积增加了5.05%;计算一条双精度浮点除法的延迟为12个时钟周期,流水执行时,单条除法平均延迟为3个时钟周期,与其他处理器中基于SRT算法实现的双精度浮点除法器相比,数据吞吐率提高了3~7倍;与其他处理器中基于Goldschmidt算法实现的双精度浮点除法器相比,数据吞吐率提高了2~3倍。

关键词: 浮点除法器, Goldschmidt算法, 倒数查找表, 高性能除法器, 数字信号处理

CLC Number: