[1]INOUE H, MORIYAMA T, KOMATSU H, et al. AA-Sort: a new parallel sorting algorithm for multi-core SIMD processors [C]// Proceedings of the 16th International Conference on Parallel Architecture and Compilation Techniques. Washington, DC: IEEE Computer Society, 2007:189-198.[2]RAMPRASAD N, BARUAH P K. Radix sort on the cell broadband engine [C]// HiPC2007: Proceedings of the 14th Annual IEEE International Conference on High Performance Computing. Piscataway, NJ: IEEE Press, 2007.[3]CEDERMAN D, TSIGAS P. On sorting and load balancing on GPU [J]. ACM SIGARCH Computer Architecture News,2008,36(5):11-18.[4]GREB A, ZACHMANN G. GPU-ABiSort: optimal parallel sorting on stream architectures [C]// Proceedings of the 20th International Parallel and Distributed Processing Symposium. Washington, DC: IEEE Computer Society, 2006: 25-29.[5]HAO S, DU Z, BADER D, et al. A partition-merge based cache-conscious parallel sorting algorithm for CMP with shared cache [C]// Proceedings of the 38th International Conference on Parallel Processing. Washington, DC: IEEE Computer Society, 2009:396-403.[6]HULTN R, KESSLER C W, KELLER J. Optimized on-chip-pipelined mergesort on the Cell/B.E. [C]// Proceedings of the 16th International Euro-Par Conference on Parallel Processing, Part Ⅱ. Berlin: Springer-Verlag, 2010: 187-198.[7]SATISH N, KIM C, CHHUGANI J, et al. Fast sort on CPUs and GPUs: a case for bandwidth oblivious SIMD sort [C]// Proceedings of 2010 ACM SIGMOD International Conference on Management of data. New York: ACM Press, 2010: 351-362.[8]ZHONG C, QU Z Y, YANG F, et al. Efficient and scalable thread-level parallel algorithms for sorting multisets on multi-core systems [J]. Journal of Computers, 2012,7(1):30-41.[9]ZHONG C, KE Q, LIU J, et al. Thread-level parallel algorithm for sorting integer sequence on multi-core computers [C]// Proceedings of the 4th International on Parallel Architectures, Algorithms and Programming. Washington, DC: IEEE Computer Society, 2011: 37-41.[10]ZHONG C, FENG P, YIN M X, et al. Sampling-based cache-efficient parallel sorting on multi-core systems [J]. Journal of Computational Information Systems, 2012, 8(8):6713-6722.[11]ZHONG C, LI X, YANG F, et al. Scheduling divisible loads with return messages on multi-core heterogeneous clusters with unknown system parameters [J]. International Journal of Advancements in Computing Technology, 2012, 4(7):110-120.[12]柯琦,钟诚,李智,等. 多核计算机上的最大和子序列并行算法[C]// 计算机技术与应用进展2010.合肥:中国科学技术大学出版社, 2010: 586-590.[13]陈国良. 并行计算——结构·算法·编程 [M]. 修订版. 北京:高等教育出版社, 2003: 140-141. |