计算机应用 ›› 2021, Vol. 41 ›› Issue (9): 2652-2657.DOI: 10.11772/j.issn.1001-9081.2020111792

所属专题: 先进计算

• 先进计算 • 上一篇    下一篇

TACLeBench中内核程序循环级推测并行性分析

孟慧玲1, 王耀彬1, 李凌1, 杨洋2, 王欣夷1, 刘志勤1   

  1. 1. 西南科技大学 计算机科学与技术学院, 绵阳 四川 621010;
    2. 四川省计算机研究院, 成都 610041
  • 收稿日期:2020-11-17 修回日期:2021-02-07 出版日期:2021-09-10 发布日期:2021-05-08
  • 通讯作者: 王耀彬
  • 作者简介:孟慧玲(1997-),女,四川眉山人,硕士研究生,主要研究方向:计算机系统结构;王耀彬(1982-),男,四川乐山人,教授,博士,CCF会员,主要研究方向:计算机系统结构;李凌(1982-),男,四川泸州人,讲师,博士,主要研究方向:网络安全、高性能计算、数值模拟与仿真;杨洋(1982-),男,四川泸州人,工程师,硕士,主要研究方向:虚拟现实、软件工程;王欣夷(1996-),女,四川中江人,硕士研究生,主要研究方向:计算机系统结构;刘志勤(1962-),女,四川绵阳人,教授,硕士,主要研究方向:高性能计算。
  • 基金资助:
    国家自然科学基金面上项目(61672438)。

Loop-level speculative parallelism analysis of kernel program in TACLeBench

MENG Huiling1, WANG Yaobin1, LI Ling1, YANG Yang2, WANG Xinyi1, LIU Zhiqin1   

  1. 1. School of Computer Science and Technology, Southwest University of Science and Technology, Mianyang Sichuan 621010, China;
    2. Sichuan Institute of Computer Sciences, Chengdu Sichuan 610041, China
  • Received:2020-11-17 Revised:2021-02-07 Online:2021-09-10 Published:2021-05-08
  • Supported by:
    This work is partially supported by the Surface Program of National Natural Science Foundation of China (61672438).

摘要: 线程级推测(TLS)技术可挖掘程序并行执行潜能,提高多核资源利用率,但目前TACLeBench的内核基准仍未在TLS并行化中得到有效分析。针对该问题设计了循环级推测执行的剖析方案和剖析工具。选取7个代表性的TACLeBench内核基准程序,首先对程序进行初始化分析,选取程序热点片段插入循环标识;其次对这些片段进行交叉编译,记录程序推测线程与内存地址相关数据,剖析其循环级最大潜在并行性;最后综合探讨程序运行时的特征(线程粒度、可并行化覆盖率、依赖特征)以及源码对加速比的影响。实验结果表明:1)该类程序适合采用TLS加速,与串行执行结果相比,循环结构的推测执行下的大部分程序的加速比在2以上,其中最高加速比达到20.79;2)利用TLS加速TACLeBench内核程序时,多数应用可有效利用4核到16核的计算资源。

关键词: 线程级推测, 多核, 并行, TACLeBench, 内核基准

Abstract: Thread-Level Speculation (TLS) technology can tap the parallel execution potential of programs and improve the utilization of multi-core resources. However, the current TACLeBench kernel benchmarks are not effectively analyzed in TLS parallelization. In response to this problem, the loop-level speculative execution analysis scheme and analysis tool were designed. With 7 representative TACLeBench kernel benchmarks selected, firstly, the initialization analysis was performed to the programs, the program hot fragments were selected to insert the loop identifier. Then, the cross-compilation was performed to these fragments, the program speculative thread and the memory address related data were recorded, and the maximun potential of the loop-level parallelism was analyzed. Finally, the program runtime characteristics (thread granularity, parallelizable coverage, dependency characteristics) and the impacts of the source code on the speedup ratio were comprehensively discussed. Experimental results show that:1) this type of programs is suitable for TLS acceleration, compared with serial execution results, under the loop structure speculative execution, the speedup ratios for most programs are above 2, and the highest speedup ratio in them can reach 20.79; 2) by using TLS to accelerate the TACLeBench kernel programs, most applications can effectively make use of 4-core to 16-core computing resources.

Key words: Thread-Level Speculation (TLS), multi-core, parallel, TACLeBench, kernel benchmark

中图分类号: