[1] 高伟,赵荣彩,韩林,等.SIMD自动向量化编译优化概述[J].软件学报,2015,26(6):1265-1284. (GAO W, ZHAO R C, HAN L. Research on SIMD auto-vectorization compiling optimization[J]. Journal of Software, 2015, 26(6):1265-1284.). [2] LARSEN S, AMARASINGHE S. Exploiting superword level parallelism with multimedia instruction sets[J]. ACM SIGPLAN Notices, 2000, 35(5):145-156. [3] TOUMAVITIS G, WANG Z, FRANKE B, et al. Towards a holistic approach to auto-parallelization:integrating profile-driven parallelism detection and machine-learning based mapping[J]. ACM SIGPLAN Notices, 2009, 44(6):177-187. [4] HIROAKI T, AKEUCHI Y, SAKANUSHI K, et al. Pack instruction generation for media processors using multi-valued decision diagram[C]//Proceedings of the 4th International Conference on Hardware/Software Codesign and System Synthesis. New York:ACM, 2006:154-159. [5] TENLLADO C, PIЙUEL L, PRIETO M, et al. Improving superword level parallelism support in modern compilers[C]//CODES+ISSS'05:Proceedings of the 3rd IEEE/ACM/IFIP International Conference on Hardware/software Codesign and System Synthesis. Piscataway, NJ:IEEE, 2005:303-308. [6] SHIN J, CHAME J, HALL M W. Compiler-controlled caching in superword register files for multimedia extension architectures[C]//PACT'02:Proceedings of the 2002 International Conference on Parallel Architectures & Compilation Techniques. Piscataway, NJ:IEEE, 2002:45-55. [7] SHIN J, CHAME J, HALL M. Exploiting superword-level locality in multimedia extension architectures[J]. Journal of Instruction-Level Parallelism, 2003, 5:1-28. [8] SHIN J. Compiler optimizations for architectures supporting superword-level parallelism[D]. Los Angeles, CA:University of Southern California, 2005. [9] SHIN J, HALL M, CHAME J. Superword-level parallelism in the presence of control flow[C]//CGO'05:Proceedings of the 2005 International Symposium on Code Generation and Optimization. Washington, DC:IEEE Computer Society, 2005:165-175. [10] NUZMAN D, ROSEN I, ZAKS A. Auto-vectorization of interleaved data for SIMD[J]. ACM SIGPLAN Notices, 2010, 41(6):132-143. [11] NUZMAN D, ZAKS A. Outer-loop vectorization:revisited for short SIMD architectures[C]//PACT'08:Proceedings of the 2008 International Conference on Parallel Architectures and Compilation Techniques. New York:ACM, 2008:2-11. [12] BARIK R, ZHAO J, SARKAR V. Efficient selection of vector Instructions using dynamic programming[C]//MICRO'43:Proceedings of the 2010 IEEE/ACM International Symposium on Microarchitecture. Washington, DC:IEEE Computer Society, 2010:201-212. [13] 侯永生, 赵荣彩, 高伟. 非正规化循环的单指令多数据向量化[J].计算机应用,2013,33(11):3149-3154. (HOU Y S, ZHAO R C, GAO W. Single instruction multiple data vectorization of non-normalized loops[J]. Journal of Computer Applications, 2013, 33(11):3149-3154.). [14] 徐金龙,赵荣彩,韩林.分段约束的超字并行向量发掘路径优化算法[J].计算机应用,2015,35(4):950-955. (XU J L, ZHAO R C, HAN L. Vector exploring path optimization algorithm of superworld level parallelism with subsection constraints[J].Journal of Computer Applications,2015, 35(4):950-955). [15] 魏帅,赵荣彩,姚远.面向SLP的多重循环向量化[J].软件学报,2012,23(7):1717-1728. (WEI S, ZHAO R C, YAO Y. Loop-nest auto-vectorization based on SLP[J]. Journal of Software, 2012, 23(7):1717-1728.). [16] 索维毅,赵荣彩,姚远,等.面向DSP的超字并行指令分析和冗余优化算法[J].计算机应用,2012,32(12):3303-3307. (SUO W Y, ZHAO R C, YAO Y, et al. Superword level parallelism instruction analysis and redundancy optimization algorithm on DSP[J]. Journal of Computer Applications, 2012, 32(12):3303-3307.). [17] PORPODAS V, MAGNI A, JONES T M. PSLP:padded SLP automatic vectorization[C]//CGO'15:Proceedings of the 13th IEEE/ACM International Symposium on Code Generation and Optimization. Washington, DC:IEEE Computer Society, 2015:190-201. [18] PORPODAS V, JONES T M. Throttling automatic vectorization:when less is more[C]//Proceedings of the 2015 International Conference on Parallel Architecture & Compilation. Piscataway, NJ:IEEE, 2015:432-444. [19] WOLFE J M. High Performance Compilers for Parallel Computing[M]. Boston, MA:Addison-Wesley, 1995:225-231. [20] Spec cpu2006[EB/OL].[2016-04-06]. http://www.spec.org/cpu2006/. [21] NAS parallel benchmark suite[EB/OL].[2016-04-06]. http://www.nas.nasa.gov/Resources/Software/npb.html. [22] FRITTS J E, STEILING F W, TUCEK J A, et al. MediaBench Ⅱ video:expediting the next generation of video systems research[J]. Microprocessors & Microsystems, 2005, 33(4):301-318. [23] POUCHET L N. PolyBench:the polyhedral benchmark suite[EB/OL].[2016-04-06]. http://www.cs.ucla.edu/?pouchet/software/polybench/, 2012. |