计算机应用 ›› 2017, Vol. 37 ›› Issue (2): 450-456.DOI: 10.11772/j.issn.1001-9081.2017.02.0450

• 先进计算 • 上一篇    下一篇

新型超字级并行改进算法

张素平, 韩林, 丁丽丽, 王鹏翔   

  1. 数学工程与先进计算国家重点实验室(信息工程大学), 郑州 450001
  • 收稿日期:2016-07-31 修回日期:2016-10-24 出版日期:2017-02-10 发布日期:2017-02-11
  • 通讯作者: 张素平,892841546@qq.com
  • 作者简介:张素平(1991-),女,河南禹州人,硕士研究生,主要研究方向:高性能计算、先进编译技术;韩林(1978-),男,山东临沂人,副教授,博士,主要研究方向:高性能计算、先进编译技术;丁丽丽(1992-),女,河南商丘人,硕士研究生,主要研究方向:高性能计算、先进编译技术;王鹏翔(1988-),男,河南安阳人,硕士研究生,主要研究方向:高性能计算。
  • 基金资助:
    “核高基”国家科技重大专项(2009ZX01036-001-001-2)。

New improved algorithm for superword level parallelism

ZHANG Suping, HAN Lin, DING Lili, WANG Pengxiang   

  1. State Key Laboratory of Mathematical Engineering and Advanced Computing(Information Engineering University), Zhengzhou Henan 450001, China
  • Received:2016-07-31 Revised:2016-10-24 Online:2017-02-10 Published:2017-02-11
  • Supported by:
    This work is partially supported by the National High Technology Research and Development Program (HeGaoJi Program) of China (2009ZX01036-001-001-2).

摘要: 对于超字级并行(SLP)算法不能有效地处理大型程序中并行代码率较小,且可向量化的代码中可能存在对向量化不利的代码的问题,提出了一种新型的SLP改进算法NSLPO。首先,将程序中不能向量化的非同构语句进行同构化处理,定位SLP丢失的向量化机会;然后,通过冗余节点添加构建最大通用子图,通过冗余删除等优化过程得到同构化之后的补充SLP图,提高程序中代码的并行性;最后,运用节流法将对向量化有害的代码摒除在向量化之外,仅对它们进行标量处理,通过只向量化处理那些向量化有收益的代码以尽可能地提升程序效率。在一组广泛使用的内核测试集中进行实验,结果显示,与SLP算法相比,NSLPO算法性能更优,其执行时间比SLP平均减少9.1%。

关键词: 同构, 节流法, 向量化, 超字级并行, 补充图

Abstract: For SLP (Superword Level Parallelism) algorithm cannot effectively process the large-scale applications covered with few parallel codes, and the codes which can be vectorized may be adverse to vectorization. A new improved algorithm for SLP was proposed, namely NSLPO. First of all, the non-isomorphic statements which cannot be vectorized were transformed to isomorphic statements as far as possible, thus locating the opportunities of vectorization which SLP has lost. Secondly, the Max Common Subgraph (MCS) was built by adding redundant nodes, and the supplement diagram of SLP was got by using some optimization such as redundancy deleting, which can greatly increase the parallelism of program. At last, the codes which are harmful to vectorization were exclued out of vectorization by using cutting method and executed in serial, only the valuable codes for vectorization were vectorized to improve the efficiency of programs as far as possible. Experiments were conducted on widely used kernel test sets. The experimental results show that compared with the SLP algorithm, the proposed NSLPO algorithm has better performance and its running time was reduced by 9.1%.

Key words: isomorphism, cutting method, vectorization, Superword Level Parallelism (SLP), supplement diagram

中图分类号: