Journals
  Publication Years
  Keywords
Search within results Open Search
Please wait a minute...
For Selected: Toggle Thumbnails
Floating point divider design of high-performance double precision based on Goldschmidt's algorithm
HE Tingting, PENG Yuanxi, LEI Yuanwu
Journal of Computer Applications    2015, 35 (7): 1854-1857.   DOI: 10.11772/j.issn.1001-9081.2015.07.1854
Abstract1032)      PDF (740KB)(790)       Save

Focusing on the issue that division is complex and needs a large delay to compute, a kind of method for designing the unit of high-performance double precision floating point divider based on Goldschmidt's algorithm was proposed and it supported IEEE-754 standard. Firstly, it was analyzed that how to compute division using Goldschmidt's algorithm and the error produced during iterative operation. Then, the method for controlling error was proposed. Secondly, bipartite reciprocal tables were adopted to calculate initial value of iteration with area saving, and parallel multipliers were adopted in the iterative unit for accelerating. Lastly, the executed station was divided reasonably and it made floating point divider supporting pipeline execution with state machine controlling. So, the speed of divider was improved. The experimental results show that the double precision floating point divider adopted 14-bit iterative initial value pipeline structure, its synthesis cell area is 84902.2618 μm2, the running frequency is up to 2.2 GHz with 40 nm technology. Compared with 8-bit iterative initial value pipeline structure, computing speed is increased by 32.73% and area is increased by 5.05%. The delay of a double precision floating division instruction is 12 cycles, and it is decreased to 3 cycles in pipeline execution. Compared with the divider based on SRT algorithm implemented in other processers, data throughput is improved by 3-7 times. Compared with the divider based on Goldschmidt's algorithm implemented in other processers, data throughput is improved by 2-3 times.

Reference | Related Articles | Metrics
Building algorithm for tree-ring application layer multicast based on ant colony algorithm
XU Jianzhen HE Tingting HE Dan ZHOU Tong
Journal of Computer Applications    2013, 33 (12): 3449-3452.  
Abstract611)      PDF (798KB)(451)       Save
As an improvement of IP multicast technology, Application Layer Multicast (ALM) has many advantages such as unlimited network architecture, rich resource and high data transfer rate. Considering node performance and end to end delay, a fast and efficient method was proposed to establish application layer multicast tree, it was named Ant Colony Algorithm based Tree-ring Application Layer Multicast Model (ACOTRM). The available studies only gave a topology cursory and had no complete and clear description of the concrete construction process. In view of this, a complete ALM hierarchical tree-ring concrete construction process was put forward including several key steps, such as clustering division, connection in cluster ring, generation of feasible solution and maintenance of the model in survival time. In addition, in order to optimize the ALM state tree, each node was set with a specific priority. The simulation results show that the proposed model provides lower average delay and higher average data delivery ratio, which increases the system stability and forwarding efficiency at the same time.
Related Articles | Metrics
Accelerating hierarchical distributed latent Dirichlet allocation algorithm by parallel GPU
WEN La RUI Jianwu HE Tingting GUO Liang
Journal of Computer Applications    2013, 33 (12): 3313-3316.  
Abstract1005)      PDF (802KB)(1057)       Save
Hierarchical Distributed Latent Dirichlet Allocation (HD-LDA), a popular topic modeling technique for exploring collections, is an improved Latent Dirichlet Allocation (LDA) algorithm running in distributed environment. Mahout has realized HD-LDA algorithm in the framework of Hadoop. However the algorithm processed the whole documents of a single node in sequence, and the execution time of the HD-LDA program was very long when processing a large amount of documents. A new method was proposed to combine Hadoop with Graphic Processing Unit (GPU) to solve the above problem when transferring the computation from CPU to GPU. The application results show that combining the Hadoop with GPU which processes many documents in parallel can decrease the execution time of HD-LDA program greatly and achieve seven times speedup.
Related Articles | Metrics
Application of cooperative filtering in categories recommendation of Chinese Wikipedia
WANG Jing HE Tingting Yimamu'aishan ABUDOULIKEMU
Journal of Computer Applications    2013, 33 (03): 838-840.   DOI: 10.3724/SP.J.1087.2013.00838
Abstract867)      PDF (639KB)(489)       Save
Collaborative filtering was applied to automatically recommend categories for a Chinese Wikipedia article. Four typical semantic features namely incoming link, outgoing link, incoming link categories and outgoing link categories, were adopted to represent articles. Among all the categories of articles similar to target article, several most similar categories were chosen as the recommendation results to the target article, via calculating the similarity value between them. The experimental results show that the four semantic features have efficient performance in Wikipedia article representation. And the collaborative filtering method is also proved to be effective in recommending proper categories for Chinese Wikipedia articles.
Reference | Related Articles | Metrics