Search Result

Select

Floating point divider design of high-performance double precision based on Goldschmidt's algorithm

HE Tingting, PENG Yuanxi, LEI Yuanwu

Journal of Computer Applications 2015, 35 (7): 1854-1857. DOI: 10.11772/j.issn.1001-9081.2015.07.1854

Abstract （1032）

PDF （740KB）（790）

Save

Focusing on the issue that division is complex and needs a large delay to compute, a kind of method for designing the unit of high-performance double precision floating point divider based on Goldschmidt's algorithm was proposed and it supported IEEE-754 standard. Firstly, it was analyzed that how to compute division using Goldschmidt's algorithm and the error produced during iterative operation. Then, the method for controlling error was proposed. Secondly, bipartite reciprocal tables were adopted to calculate initial value of iteration with area saving, and parallel multipliers were adopted in the iterative unit for accelerating. Lastly, the executed station was divided reasonably and it made floating point divider supporting pipeline execution with state machine controlling. So, the speed of divider was improved. The experimental results show that the double precision floating point divider adopted 14-bit iterative initial value pipeline structure, its synthesis cell area is 84902.2618 μm², the running frequency is up to 2.2 GHz with 40 nm technology. Compared with 8-bit iterative initial value pipeline structure, computing speed is increased by 32.73% and area is increased by 5.05%. The delay of a double precision floating division instruction is 12 cycles, and it is decreased to 3 cycles in pipeline execution. Compared with the divider based on SRT algorithm implemented in other processers, data throughput is improved by 3-7 times. Compared with the divider based on Goldschmidt's algorithm implemented in other processers, data throughput is improved by 2-3 times.

Reference | Related Articles | Metrics

Select

Building algorithm for tree-ring application layer multicast based on ant colony algorithm

XU Jianzhen HE Tingting HE Dan ZHOU Tong

Journal of Computer Applications 2013, 33 (12): 3449-3452.

Abstract （611）

PDF （798KB）（451）

Save

As an improvement of IP multicast technology, Application Layer Multicast (ALM) has many advantages such as unlimited network architecture, rich resource and high data transfer rate. Considering node performance and end to end delay, a fast and efficient method was proposed to establish application layer multicast tree, it was named Ant Colony Algorithm based Tree-ring Application Layer Multicast Model (ACOTRM). The available studies only gave a topology cursory and had no complete and clear description of the concrete construction process. In view of this, a complete ALM hierarchical tree-ring concrete construction process was put forward including several key steps, such as clustering division, connection in cluster ring, generation of feasible solution and maintenance of the model in survival time. In addition, in order to optimize the ALM state tree, each node was set with a specific priority. The simulation results show that the proposed model provides lower average delay and higher average data delivery ratio, which increases the system stability and forwarding efficiency at the same time.

Related Articles | Metrics

Select

Accelerating hierarchical distributed latent Dirichlet allocation algorithm by parallel GPU

WEN La RUI Jianwu HE Tingting GUO Liang

Journal of Computer Applications 2013, 33 (12): 3313-3316.

Abstract （1005）

PDF （802KB）（1057）

Save

Hierarchical Distributed Latent Dirichlet Allocation (HD-LDA), a popular topic modeling technique for exploring collections, is an improved Latent Dirichlet Allocation (LDA) algorithm running in distributed environment. Mahout has realized HD-LDA algorithm in the framework of Hadoop. However the algorithm processed the whole documents of a single node in sequence, and the execution time of the HD-LDA program was very long when processing a large amount of documents. A new method was proposed to combine Hadoop with Graphic Processing Unit (GPU) to solve the above problem when transferring the computation from CPU to GPU. The application results show that combining the Hadoop with GPU which processes many documents in parallel can decrease the execution time of HD-LDA program greatly and achieve seven times speedup.