[1] STRANDE S M, CICOTTI P, SINKOVITS R S, et al. Gordon:design, performance, and experiences deploying and supporting a data intensive supercomputer[C]//Proceedings of the 1st Conference of the Extreme Science and Engineering Discovery Environment:Bridging from the Extreme to the Campus and Beyond. New York:ACM, 2012:Article No. 3. [2] BRONEVETSKY G, MOODY A. Scalable I/O systems via node-local storage:approaching 1 TB/sec file I/O, LLNL-TR-415791[R]. Livermore, CA:Lawrence Livermore National Laboratory, 2009:1-6. [3] ZAHARIA M, CHOWDHURY M, DAS T, et al. Fast and interactive analytics over Hadoop data with Spark[J]. Login, 2012, 37(4):45-51. [4] Apache Spark. Spark overview[EB/OL].[2015-03-18]. http://spark.apache.org. [5] ZAHARIA M, CHOWDHURY M, DAS T, et al. Resilient distributed datasets:a fault-tolerant abstraction for in-memory cluster computing[C]//Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA:USENIX Association, 2012:2. [6] LIN X, WANG P, WU B. Log analysis in cloud computing environment with Hadoop and Spark[C]//Proceedings of the 5th IEEE International Conference on Broadband Network and Multimedia Technology. Piscataway, NJ:IEEE, 2013:273-276. [7] DONG X, XIE Y, MURALIMANOHAR N, et al. Hybrid checkpointing using emerging nonvolatile memories for future exascale systems[J]. ACM Transactions on Architecture and Code Optimization, 2011, 8(2):510-521. [8] 慈轶为,张展,左德承,等.可扩展的多周期检查点设置[J].软件学报,2010,21(2):218-230.(CI Y W, ZHANG Z, ZUO D C, et al. Scalable time-based multi-cycle checkpointing[J]. Journal of Software, 2010, 21(2):218-230.) [9] DEAN J, GHEMAWAT S. MapReduce:simplified data processing on large clusters[C]//Proceedings of the 6th Conference on Symposium on Opearting Systems Design and Implementation. Berkeley, CA:USENIX Association, 2004,6:10. [10] KWON Y, BALAZINSKA M, HOWE B, et al. A study of skew in MapReduce application[EB/OL].[2016-03-18]. https://www.researchgate.net/publication/228941278_A_Study_of_Skew_in_MapReduce_Applications. [11] KWON Y, BALAZINSKA M, HOWE B, et al. Skew-resistant parallel processing of feature-extracting scientific user-defined functions[C]//Proceedings of the 1st ACM Symposium on Cloud Computing. New York:ACM, 2010:75-86. [12] 王卓,陈群,李战怀,等.基于增量式分区策略的MapReduce数据均衡方法[J].计算机学报,2016,39(1):19-35.(WANG Z, CHEN Q, LI Z H, et al. An incremental partitioning strategy for data balance on MapReduce[J]. Chinese Journal of Computers, 2016, 39(1):19-35.) [13] KWON Y, BALAZINSKA M, HOWE B, et al. SkewTune:mitigating skew in MapReduce applications[C]//Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York:ACM, 2012:25-36. [14] YAN W, XUE Y, MALIN B. Scalable and robust key group size estimation for reducer load balancing in MapReduce[C]//Proceedings of the 2013 IEEE International Conference on Big Data. Piscataway, NJ:IEEE, 2013:156-162. [15] RAMAKRISHNAN S R, SWART G, URMANOV A, et al. Balancing reducer skew in MapReduce workloads using progressive sampling[C]//Proceedings of the 3rd ACM Symposium on Cloud Computing. New York:ACM, 2012:Article No. 16. [16] GUFLER B, AUGSTEN N, REISER A, et al. Handing data skew in MapReduce[C]//Proceedings of the 1st International Conference on Cloud Computing and Services Science. Berlin:Springer, 2011:574-583. [17] GUFLER B, AUGSTEN N, REISER A, et al. Load balancing in MapReduce based on scalable cardinality estimates[C]//Proceedings of the 2012 IEEE 28th International Conference on Data Engineering. Washington, DC:IEEE Computer Society, 2012:522-533. [18] KOLB L, THOR A, RAHM E. Load balancing for MapReduce-based entity resolution[C]//Proceedings of the 2012 IEEE 28th International Conference on Data Engineering. Washington, DC:IEEE Computer Society, 2012:618-629. [19] KOLB L, THOR A, RAHM E, et al. Block-based load balancing for entity resolution with MapReduce[C]//Proceedings of the 20th ACM International Conference on Information and Knowledge Management. New York:ACM, 2011:2397-2400. [20] RACHA S C. Load balancing Map-Reduce communications for efficient executions of applications in a cloud[D]. Bangalore, India:Indian Institute of Science, 2012:12-16. [21] IBRAHIM S, JIN H, LU L, et al. Handling partitioning skew in MapReduce using LEEN[J]. Peer-to-Peer Networking and Applications, 2013, 6(4):409-424. [22] JURE L. Stanford network analysis project[EB/OL].[2015-03-18]. http://snap.stanford.edu. |