Implementation and performance analysis of Knuth39 parallelization based on many integrated core platform
ZHANG Baodong1, ZHOU Jinyu2, LIU Xiao1,2, HUA Cheng2, ZHOU Xiaohui1,2
1. School of Computer Science and Technology, Xi'an University of Posts and Telecommunications, Xi'an Shaanxi 710121, China;
2. Parallel Computing Laboratory, Shaanxi Research Center for High Performance Computing, Xi'an Shaanxi 710121, China
To solve the low running speed problem of Knuth39 random number generator, a Knuth39 parallelization method based on Many Integrated Core (MIC) platform was proposed. Firstly, the random number sequence of Knuth39 generator was divided into subsequences by regular interval. Then, the random numbers were generated by every thread from the corresponding subsequence's starting point. Finally, the random number sequences generated by all threads were combined into the final sequence. The experimental results show that the parallelized Knuth39 generator successfully passed 452 tests of TestU01, the results are the same as those of Knuth39 generator without parallelization. Compared with single thread on Central Processing Unit (CPU), the optimal speed-up ratio on MIC platform is 15.69 times. The proposed method improves the running speed of Knuth39 generator effectively, ensures the randomness of the generated sequences, and it is more suitable for high performance computing.
[1] SALMON J K, MORAES M A, DROR R O, et al. Parallel random numbers: as easy as 1, 2, 3 [C]// SC'11: Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2011: Article No. 16. [2] WANG M, GUO F, QU H, et al. Combined random number generators: a review [C]// ICCSN 2011: Proceedings of the 2011 IEEE 3rd International Conference on Communication Software and Networks. Piscataway: IEEE, 2011: 443-447. [3] OYA K, KITADA T, TANAKA S. Study on random number generator in Monte Carlo code [J]. Atomic Energy Society, 2011, 10(4): 301-309. [4] LEHMER D H. Mathematical methods in large-scale computing units [C]// Proceedings of the Second Symposium on Large-Scale Digital Calculating Machinery. Cambridge: Harvard University Press, 1951: 141-146. [5] KNUTH D E. The art of computer programming, volume 2: seminumerical algorithms [M]. 3rd ed. Boston: Addison-Wesley, 1998: 108. [6] MARTON K, SUCIU A, PETRICEAN D. A parallel unpredictable random number generator [C]// Proceedings of the 2011 10th Roedunet International Conference: Networking in Education and Research. Piscataway: IEEE, 2011: 1-5. [7] GAO S, PETERSON G D. GASPRNG: GPU accelerated scalable parallel random number generator library [J]. Computer Physics Communications, 2013, 184(4): 1241-1249. [8] BRADLEY T, du TOIT J, GILES M, et al. Parallelisation techniques for random number generators [M]// GPU Computing Gems Emerald Edition. San Francisco: Morgan Kaufmann, 2011: 231-246. [9] BARASH L Y, SHCHUR L N. RNGSSELIB: program library for random number generation, SSE2 realization [J]. Computer Physics Communications, 2011, 182(7): 1518-1527. [10] BARASH L Y, SHCHUR L N. PRAND: GPU accelerated parallel random number generation library: using most reliable algorithms and applying parallelism of modern GPUs and CPUs [J]. Computer Physics Communications, 2014, 185(4): 1343-1353. [11] ZHU Y, LI B, SUN T, et al. Research on scalability of parallel computing system [J]. Computer Engineering and Applications, 2011, 47(21): 47-49.(祝永志,李丙峰,孙婷婷,等.并行计算系统可扩展性的研究[J].计算机工程与应用,2011,47(21):47-49.) [12] KHEMKA J, GAJJAR M, VASWANI S, et al. Performance evaluation of medical imaging algorithms on Intel MIC platform [C]// Proceedings of the 2013 20th International Conference on High Performance Computing. Piscataway: IEEE, 2013: 396-404. [13] SUCIU A, TOMA R A, MARTON K. Parallel implementation of the TestU01 statistical test suite [C]// Proceedings of the 2012 IEEE 8th International Conference on Intelligent Computer Communication and Processing. Piscataway: IEEE, 2012: 317-322.