1. Information Engineering University, Zhengzhou Henan 450002, China
2. State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou Henan 450002, China
Abstract:The existing parallel cost models are mostly devised for shared memory or distributed memory architecture, thus not suitable for heterogeneous multi-core processors. In order to solve the problem, a new parallel cost model for heterogeneous multi-cores was proposed. It described the impact of computing capacity, memory access delay and data transfer cost on parallel execution time of loops quantitatively, thus improving the veracity of accelerated parallel loop recognition. The experimental results show that the proposed model can effectively recognize the accelerated parallel loops. Using its recognition results to generate parallel codes can improve the performance of parallel programs on heterogeneous multi-core processors significantly.
LIAO C H. A compile-time OpenMP cost model[D]. Houston: University of Houston, 2007.
[3]
TRIFUNOVIC K, NUZMAN D, COHEN A, et al. Polyhedral-model guided loop-nest auto-vectorization[C]// Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. Washington, DC: IEEE Computer Society,2009:327-337.
[4]
BONDHUGULA U, GUNLUK O, DASH S, et al. A model for fusion and code motion in an automatic parallelizing compiler[C]// Proceedings of the 19th International Conference on Parallel Architectures and Compilation Techniques. Washington, DC:IEEE Computer Society,2010:343-352.
[5]
SHARAPOV I, KROEGER R, DELAMATER G, et al. A case study in top-down performance estimation for a large-scale parallel application[C]// Proceedings of the 11th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming. New York:ACM,2006:81-89.
[6]
CONG J, YUAN B. Energy-efficient scheduling on heterogeneous multi-core architecture[C]// Proceedings of the 2012 ACM/IEEE International Symposium on Low Power Electronics and Design.New York:ACM,2012: 345-350.
[7]
CHEN T, RAGHAVAN R, DALE J N, et al. Cell broadband engine architecture and its first implementation-a performance view[J]. IBM Journal of Research and Development, 2007, 51(5): 559-572.
[8]
SKOVHEDE K, LARSEN M N,VINTER B. Extending distributed shared memory for the cell broadband engine to a channel model[C]// Proceedings of the 10th International Conference on Applied Parallel and Scientific Computing. Berlin:Springer-Verlag, 2012, 7133: 108-118.
KINDRATENKO V V. Novel computing architecture[J]. Computing in Science & Engineering, 2009, 11(3): 54-57.
[11]
BLAGOJEVIC F, FENG X Z, CAMERON K W, et al. Modeling multigrain parallelism on heterogeneous multi-core processors: a case study of the cell BE[C]// Proceedings of the 2008 International Conference on High-Performance Embedded Architectures and Computers. Berlin: Springer,2008:38-52.
[12]
SHAN H Z, BLAGOJEVIC F, MI S J, et al. A programming model performance study using the NAS parallel benchmarks[J]. Scientific Programming, 2010, 18(3/4): 153-167.