[1] SHVACHKO K, KUANG H, RADIA S, et al. The Hadoop distributed file system[C]//Proceedings of the 26th IEEE Symposium on Mass Storage Systems and Technologies. Piscataway:IEEE,2010:1-10. [2] LIAO J,TRAHAY F,XIAO G,et al. Performing initiative data prefetching in distributed file systems for cloud computing[J]. IEEE Transactions on Cloud Computing,2017,5(3):550-562. [3] REN Z,XU X,WAN J,et al. Workload characterization on a production Hadoop cluster:a case study on Taobao[C]//Proceedings of the 2012 IEEE International Symposium on Workload Characterization. Piscataway:IEEE,2012:3-13. [4] ABAD C L. Big data storage workload characterization,modeling and synthetic generation[D]. Champaign-Urbana,IL:University of Illinois at Urbana-Champaign,2014:35-50. [5] DITTRICH J,QUIANÉ-RUIZ J A. Efficient big data processing in Hadoop MapReduce[J]. Proceedings of the VLDB Endowment, 2012,5(12):2014-2015. [6] 金国栋, 卞昊穹, 陈跃国, 等. HDFS存储和优化技术研究综述[J]. 软件学报,2020,31(1):137-161.(JIN G D,BIAN H Q, CHEN Y G,et al,Survey on storage and optimization techniques of HDFS[J]. Journal of Software,2020,31(1):137-161.) [7] LIN X, WANG P, WU B. Log analysis in cloud computing environment with Hadoop and Spark[C]//Proceedings of the 5th IEEE International Conference on Broadband Network and Multimedia Technology. Piscataway:IEEE,2013:273-276. [8] ZHANG B,KŘIKAVA F,ROUVOY R,et al. Self-balancing job parallelism and throughput in Hadoop[C]//Proceedings of the 16th IFIP WG 6.1 International Conference on Distributed Applications and Interoperable Systems,LNCS 9687. Cham:Springer,2016:129-143. [9] 于晓龙. MapReduce模型在Hadoop实现中计算资源利用率分析和多作业批调度优化[D]. 济南:山东大学,2016:26-38.(YU X L. MapReduce model for computing resource utilization analysis and multi-job batch scheduling optimization in Hadoop implementation[D]. Jinan:Shandong University,2016:26-38.) [10] ZACHEILAS N,KALOGERAKI V. Pareto-based scheduling of MapReduce workloads[C]//Proceedings of the 19th IEEE International Symposium on Real-Time Distributed Computing. Piscataway:IEEE,2016:174-181. [11] KIM Y, GUNASEKARAN R. Understanding I/O workload characteristics of a Peta-scale storage system[J]. Journal of Supercomputing,2015,71(3):761-780. [12] LIU Y, GUNASEKARAN R, MA X, et al. Automatic identification of application I/O signatures from noisy server-side traces[C]//Proceedings of the 12th USENIX Conference on File and Storage Technologies. Berkeley:USENIX Association,2014:213-228. [13] GUNASEKARAN R,ORAL S,HILL J,et al. Comparative I/O workload characterization of two leadership class storage clusters[C]//Proceedings of the 10th Parallel Data Storage Workshop. New York:ACM,2015:31-36. [14] LIU S, HUANG X, FU H, et al. Understanding data characteristics and access patterns in a cloud storage system[C]//Proceedings of the 13th IEEE/ACM International Symposium on Cluster,Cloud,and Grid Computing. Piscataway:IEEE,2013:327-334. [15] REN Z, SHI W, WAN J, et al. Realistic and scalable benchmarking cloud file systems:practices and lessons from AliCloud[J]. IEEE Transactions on Parallel and Distributed Systems,2017,28(11):3272-3285. [16] REN K,GIBSON G,KWON Y,et al. Hadoop's adolescence;a comparative workloads analysis from three research clusters[C]//Proceedings of the 2012 SC Companion:High Performance Computing,Networking Storage and Analysis. Piscataway:IEEE, 2012:1452-1453. [17] REN Z,XU B,SHI W,et al. iGen:a realistic request generator for cloud file systems benchmarking[C]//Proceedings of the 2016 IEEE 9th International Conference on Cloud Computing. Piscataway:IEEE,2016:343-350. [18] BOCCHI E,DRAGO I,MELLIA M. Personal cloud storage:usage,performance and impact of terminals[C]//Proceedings of the 2015 IEEE 4th International Conference on Cloud Networking. Piscataway:IEEE,2015:106-111. [19] ABAD C L,ROBERTS N,LU Y,et al. A storage-centric analysis of MapReduce workloads:file popularity,temporal locality and arrival patterns[C]//Proceedings of the 2012 IEEE International Symposium on Workload Characterization. Piscataway:IEEE, 2012:100-109. [20] KAVULYA S,TAN J,GANDHI R,et al. An analysis of traces from a production MapReduce cluster[C]//Proceedings of the 10th IEEE/ACM International Conference on Cluster,Cloud and Grid Computing. Piscataway:IEEE,2010:94-103. [21] REN Z,WAN J,SHI W,et al. Workload analysis,implications, and optimization on a production Hadoop cluster:a case study on Taobao[J]. IEEE Transactions on Services Computing,2014,7(2):307-321. [22] DIMOPOULOS S,KRINTZ C,WOLSKI R. PYTHIA:admission control for multi-framework,deadline-driven,big data workloads[C]//Proceedings of the 2017 IEEE 10th International Conference on Cloud Computing. Piscataway:IEEE,2017:488-495. [23] Apache Hadoop Software Library. HDFS users guide[EB/OL].[2020-01-05]. http://hadoop.apache.org/docs/r2.9.0/hadoopproject-dist/hadoop-hdfs/HdfsUserGuide.html#Balancer. [24] Apache Hadoop Software Library. Centralized cache management in HDFS[EB/OL].[2020-01-05]. http://hadoop.apache.org/docs/r2.9.0/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html. [25] CHEN Y, ALSPAUGH S, GANAPATHI A, et al. SWIM:Statistical Workload Injector for MapReduce[EB/OL].[2020-03-23]. https://github.com/SWIMProjectUCB/SWIM/wiki. [26] HUANG Y F,HSU J M. Mining Web logs to improve hit ratios of prefetching and caching[J]. Knowledge-Based Systems,2008,21(1):62-69. [27] WANG H,YI X,HUANG P,et al. Efficient SSD caching by avoiding unnecessary writes using machine learning[C]//Proceedings of the 47th International Conference on Parallel Processing. New York:ACM,2018:No. 82. |