计算机应用 ›› 2015, Vol. 35 ›› Issue (8): 2371-2374.DOI: 10.11772/j.issn.1001-9081.2015.08.2371

• 计算机软件技术 • 上一篇    下一篇

分簇VLIW DSP上支持单双字模式选择的SIMD编译优化

黄胜兵1,2, 郑启龙1,2, 郭连伟1,2   

  1. 1. 中国科学技术大学 计算机科学与技术学院, 合肥 230027;
    2. 安徽省高性能计算重点实验室(中国科学技术大学), 合肥 230027
  • 收稿日期:2015-03-23 修回日期:2015-06-02 出版日期:2015-08-10 发布日期:2015-08-14
  • 通讯作者: 黄胜兵(1990-),男,安徽安庆人,硕士研究生,主要研究方向:并行编译,huangsb@mail.ustc.edu.cn
  • 作者简介:郑启龙(1969-),男,四川成都人,副教授,硕士,主要研究方向:并行编译; 郭连伟(1990-),男,安徽阜阳人,硕士研究生,主要研究方向:并行编译。
  • 基金资助:

    国家"核高基"重大专项(2012ZX01034-001-001)。

SIMD compiler optimization by selecting single or double word mode for clustered VLIW DSP

HUANG Shengbing1,2, ZHENG Qilong1,2, GUO Lianwei1,2   

  1. 1. School of Computer Science and Technology, University of Science and Technology of China, Hefei Anhui 230027, China;
    2. Anhui Province Key Laboratory of High Performance Computing (University of Science and Technology of China), Hefei Anhui 230027, China
  • Received:2015-03-23 Revised:2015-06-02 Online:2015-08-10 Published:2015-08-14

摘要:

BWDSP100是一款采用超长指令字(VLIW)和单指令多数据流(SIMD)架构的针对高性能计算领域而设计的32位静态标量数字信号处理器,其指令级并行(ILP)主要是通过其特殊的分簇体系结构和SIMD指令来实现,然而现有的编译框架无法对这些特殊的SIMD指令提供支持。由于BWDSP100拥有丰富的SIMD向量化资源,且其所运用的雷达数字信号处理领域对程序的性能要求极高,因此针对BWDSP100结构的特点,在传统Open64编译器中SIMD编译优化框架的基础上提出并实现了一种支持单双字模式选择的SIMD编译优化算法,通过该算法可以显著提高一些在DSP上有着广泛运用计算密集型程序的性能。实验结果表明,与优化前相比,该算法方案在BWDSP编译器上的实现能够平均取得5.66的加速比。

关键词: 编译优化, 指令级并行, 分簇体系数字信号处理器, 超长指令字, 单指令多数据流, Open64编译器

Abstract:

BWDSP100 is a 32-bit static scalar Digital Signal Processor (DSP) with Very Long Instruction Word (VLIW) and Single Instruction Multiple Data (SIMD) features, which is designed for high-performance computing. Its Instruction Level Parallelism (ILP) is acquired though clustering and special SIMD instructions. However, the existing compiler framework can not provide support for these SIMD instructions. Since BWDSP100 has much SIMD vectorization resources and there are very high requirements in radar digital signal processing for the program performance, an SIMD optimization which surpported the selection of single or double word mode was put forward based on the traditional Open64 compiler according to the characteristics of BWDSP100 structure, and it can significantly improve the performance of some compute-intensive programs which are widely used in DSP field. The experimental results show that this algorithm can achieve speedup of 5.66 on average compared with before optimization.

Key words: compiler optimization, Instruction Level Parallelism (ILP), multi-cluster Digital Signal Processor (DSP), Very Long Instruction Word (VLIW), Single Instruction Multiple Data (SIMD), Open64 compiler

中图分类号: