基于循环分块的流水粒度优化算法

计算机应用 ›› 2013, Vol. 33 ›› Issue (08): 2171-2176.

基于循环分块的流水粒度优化算法

刘晓娴¹,²,赵荣彩¹,²,丁锐¹,²,李雁冰¹,²

1. 数学工程与先进计算国家重点实验室,郑州 450002
2. 信息工程大学,郑州 450002;

收稿日期:2013-02-18 修回日期:2013-03-25 发布日期:2013-09-11 出版日期:2013-08-01
通讯作者: 刘晓娴
作者简介:刘晓娴(1985-),女,江西宜丰人,博士研究生,主要研究方向：并行编译、高性能计算;
赵荣彩(1957-),男,河南洛阳人，教授,博士生导师,CCF高级会员,主要研究方向：并行编译、高性能计算、反编译技术;
丁锐(1984-),男,河南滑县人，博士研究生,主要研究方向：并行编译、高性能计算;
李雁冰(1989-),男,甘肃陇西人，硕士研究生,主要研究方向：并行编译。
基金资助:
“核高基”国家科技重大专项

Pipelining granularity optimization algorithm based on loop tiling

LIU Xiaoxian¹,²,ZHAO Rongcai¹,²,DING Rui¹,²,LI Yanbing¹,²

1. .Information Engineering University, Zhengzhou Henan 450002, China
2. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou Henan 450002, China

Received:2013-02-18 Revised:2013-03-25 Online:2013-09-11 Published:2013-08-01
Contact: LIU Xiaoxian
Supported by:
CHB National Major Science and Technology Project Foundation of China under Grant

摘要/Abstract

摘要： 当计算划分层迭代数目较大，或是循环体单次迭代工作量较大，但可用的并行线程数目较小时，传统的基于循环分块的流水粒度优化方法无法进行处理。为此，提出一种基于循环分块减小流水粒度的方法，并根据流水并行循环的代价模型实现最优流水粒度的求解，设计实现了一个流水计算粒度的优化算法。对有限差分松弛法(FDR)的波前循环和时域有限差分法(FDTD)中典型循环的测试表明，与传统的流水粒度选择方法相比，所提算法能够得到更优的循环分块大小。

关键词: 自动并行化, 流水并行, 流水粒度, 循环分块, 代价模型

Abstract: When the pipelining loop has a great number of iterations, or the size of its body is large, but the number of available threads is small, the workload between two synchronizations of a thread is so heavy, which produces pretty low degree of parallelism. The traditional trade-off approach based on loop tiling cannot handle the above situation. To solve this problem, a pipelining granularity decreasing approach based on loop tiling was proposed. The optimal pipelining granularity was obtained by building the cost model for pipelining loop and a pipelining granularity optimizing algorithm was implemented. By measuring the wavefront loops of Finite Difference Relaxation (FDR) and the representative loops of Finite Difference Time Domain (FDTD), the loops show better performance improvement by using the proposed algorithm than the traditional one.

Key words: automatic parallelization, pipelining parallelization, pipelining granularity, loop tiling, cost model

中图分类号:

TP314

刘晓娴赵荣彩丁锐李雁冰. 基于循环分块的流水粒度优化算法[J]. 计算机应用, 2013, 33(08): 2171-2176.

LIU Xiaoxian ZHAO Rongcai DING Rui LI Yanbing. Pipelining granularity optimization algorithm based on loop tiling[J]. Journal of Computer Applications, 2013, 33(08): 2171-2176.

参考文献

［1］BENOIT A, MELHEM R, RENAUD-GOUD P, et al. Power-aware Manhattan routing on chip multiprocessors ［C］// Proceedings of 2012 IEEE 26th International Parallel and Distributed Processing Symposium. Piscataway: IEEE, 2012:189-200.

［2］JIN H Q, JESPEREN D, MEHROTRA P, et al. High performance computing using MPI and OpenMP on multi-core parallel systems ［J］. Parallel Computing, 2011, 37(9):562-575.

［3］BONDHUGULA U K R. Effective automatic parallelization and locality optimization using the polyhedral model ［D］. Ohio: The Ohio State University, 2008.

［4］AKHTER S, ROBERTS J. Multi-core programming: increasing performance through software multi-threading ［M］. Hillsboro: Intel Corporation, 2006: 13-27.

［5］CYTRON R. Doacross: beyond vectorization for multiprocessors［C］// Proceedings of the 1986 International Conference on Parallel Processing. Piscataway: IEEE, 1986: 836-844.

［6］CHEN D-K, YEW P-C. An empirical study on DOACROSS loops ［C］// Proceedings of Supercomputing. New York: ACM, 1991:620-632.

［7］HURSON A R, LIM J T, KAVI K M, et al. Parallelization of DOALL and DOACROSS loops — a survey ［J］. Advances in Computers, 1997, 45:53-103.

［8］LIN Y-T, WANG S-C, SHIH W-L, et al. Enable OpenCL compiler with Open64 infrastructures ［C］// 2011 IEEE 13th International Conference on High Performance Computing and Communications. Piscataway: IEEE, 2011:863-868.

［9］富弘毅, 丁滟, 宋伟,等. 一种利用并行复算实现的OpenMP容错机制［J］.软件学报,2012, 23(2): 411-427.

［10］THOMAN P, JORDAN H, PELLEGRINI S, et al. Automatic OpenMP loop scheduling: a combined compiler and runtime approach ［C］// IWOMP12: Proceedings of 8th International Conference on OpenMP in a Heterogeneous World. Berlin: Springer-Verlag, 2012:88-101.

［11］ALLEN R, KENNEDY K. Optimizing compilers for modern architectures: a dependence-based approach［M］. San Francisco: Morgan Kaufmann Publisher, 2001: 63-68.

［12］TAFLOVE A. Computational electrodynamics ［M］. London: Artech House Publishers, 1995.

［13］马琳.反馈指导的流水计算性能调优［D］.北京:中国科学院计算技术研究所,2005.

[1]	邹承明, 谢义, 吴佩. 基于Greenplum数据库的查询优化[J]. 计算机应用, 2018, 38(2): 478-482.
[2]	刘有耀, 杨鹏程. 基于JavaCC的C代码自动并行化的设计与实现[J]. 计算机应用, 2016, 36(9): 2422-2426.
[3]	黄品丰赵荣彩姚远赵捷. 面向异构多核处理器的并行代价模型[J]. 计算机应用, 2013, 33(06): 1544-1547.
[4]	郭建平肖华东刘昭华曹春香张颢光洁. 基于并行计算的气溶胶定量遥感反演模型实现[J]. 计算机应用, 2009, 29(06): 1665-1668.