[1] SAMBASIVAN R R, SHAFER I, SIGELMAN B H, et al. Principled workflow-centric tracing of distributed systems[C]//SoCC 2016:Proceeding of the 2016 Seventh ACM symposium on Cloud Computing. New York:ACM, 2016:401-414. [2] KAVULYA S P, DANIELS S, JOSHI K, et al. Draco:statistical diagnosis of chronic problems in large distributed systems[C]//DSN 2012:Proceedings of the 201242nd Annual IEEE/IFIP International Conference on Dependable System and Networks. Washington, DC:IEEE Computer Society, 2012:1-12. [3] SAMBASIVAN R R, ZHENG A X, DE ROSA M, et al. Diagnosing performance changes by comparing request flows[C]//NSDI'11:Proceeding of the 20118th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA:USENIX Association, 2011:43-56. [4] NAGARAJ K, KILLIAN C, NEVILLE J. Structured comparative analysis of systems logs to diagnose performance problems[C]//NSDI 2012:Proceedings of the 20129th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA:USENIX Association, 2012:353-366. [5] OLINER A J, KULKARNI A V, AIKEN A. Using correlated surprise to infer shared influence[C]//DSN 2010:Proceedings of the 2010 IEEE/IFIP International Conference on Dependable Systems and Networks. Piscataway, NJ:IEEE, 2010:191-200. [6] XU W, HUANG L, FOX A, et al. Detecting large-scale system problems by mining console logs[C]//SOSP'09:Proceedings of the 2009 ACM SIGOPS 22nd Symposium on Operating Systems. New York:ACM, 2009:117-132. [7] ZHAO X, ZHANG Y, LION D, et al. Iprof:a non-intrusive request flow profiler for distributed systems[C]//OSDI 2014:Proceedings of the 201411th USENIX Conference on Operating Systems Design and Implementation. Berkeley, CA:USENIX Association, 2014:629-644. [8] ZHAO X, RODRIGUES K, LUO Y, et al. Non-intrusive performance profiling for entire software stacks based on the flow reconstruction principle[C]//OSDI 2016:Proceedings of the 201612th USENIX Symposium on Operating System Design and Implementation. Berkeley, CA:USENIX Association, 2016:603-618. [9] 刘海宝,蔡皖东,许俊杰,等.分布式网络行为监控系统设计与实现[J].微电子学与计算机,2006,23(3):76-79. (LIU H B, CAI W D, XU J J, et al. Design and implement of distributed network behavior monitoring system[J]. Microelectronics & Computer, 2006, 23(3):76-79.) [10] CHANDA A, COX A L, ZWAENEPOEL W. Whodunit:transactional profiling for multi-tier applications[C]//EuroSys 2007:Proceedings of the 20072nd ACM SIGOPS/EuroSys European Conference on Computer Systems. New York:ACM, 2007:17-30. [11] BARHAM P, DONNELLY A, ISAACS R, et al. Using magpie for request extraction and workload modelling[C]//OSDI 2004:Proceedings of the 20046th USENIX Symposium on Operating Systems Design and Implementation. Berkeley, CA:USENIX Association, 2004:259-272. [12] CHEN M Y, ACCARDI A, KICIMAN E, et al. Path-based failure and evolution management[C]//NSDI 2004:Proceedings of the 1st USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA:USENIX Association, 2004:23-36. [13] REYNOLDS P, KILLIAN C E, WIENER J L, et al. Pip:detecting the unexpected in distributed systems[C]//NSDI 2006:Proceedings of the 20063rd USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA:USENIX Association, 2006:115-128. [14] THERESKA E, SALMON B, STRUNK J, et al. Stardust:tracking activity in a distributed storage system[C]//Proceedings of the 2006 Joint International Conference on Measurement and Modeling of Computer Systems. New York:ACM, 2006:3-14. [15] FONSECA R, PORTER G, KATZ R H, et al. X-trace:a pervasive network tracing framework[C]//NSDI 2007:Proceedings of the 20074th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA:USENIX Association, 2007:20-33. [16] MACE J, BODIK P, FONSECA R, et al. Retro:targeted resource management in multi-tenant distributed systems[C]//NSDI 2015:Proceedings of the 201512th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA:USENIX Association, 2015:589-603. [17] SIGELMAN B H, BARROSO L A, BURROWS M, et al. Dapper, a large-scale distributed systems tracing infrastructure, GoogleTechnical Report dapper-2010-1[R]. Mountain View:Google, 2010:29. [18] KASIKCI B, SCHUBERT B, PEREIRA C, et al. Failure sketching:a technique for automated root cause diagnosis of in-production failures[C]//SOSP 2015:Proceeding of the 25th ACM Symposium on Operating Systems Principles. New York:ACM, 2015:344-360. [19] 楼桦.服务器监控系统的实现[D].郑州:郑州大学,2004:25-28.(LOU H. Implementation of server's monitoring system[D]. Zhengzhou:Zhengzhou University, 2004:25-28.) [20] 和荣,肖海力.基于Nagios的监控平台的设计与实现[J].科研信息化技术与应用,2014,5(5):77-85.(HE R, XIAO H L. A monitor platform based on Nagios[J]. E-Science Technology & Application, 2014, 5(5):77-85.) [21] CANTRILL B M, SHAPIRO M W, LEVENTHAL A H. Dynamic instrumentation of production systems[C]//USENIX 2004:Proceedings of the 2004 USENIX Annual Technical Conference. Berkeley, CA:USENIX Association, 2004:15-28. [22] ERLINGSSON U, PEINADO M, PETER S, et al. Fay:extensible distributed tracing from kernels to clusters[J]. ACM Transactions on Computer Systems, 2012, 30(4):Article No. 13. [23] MACE J, ROELKE R, FONSECA R. Pivot tracing:dynamic causal monitoring for distributed systems[C]//SOSP 2015:Proceedings of the 201525th Symposium on Operating Systems Principles. New York:ACM, 2015:378-393. [24] Microsoft. Microsoft azure:cloud computing platform & services[EB/OL].[2017-04-15]. https://azure.microsoft.com/en-us/?v=17.14. [25] Amazon Web Service, Inc. Elastic Compute Cloud (EC2)-cloud server & hosting-AWS[EB/OL].[2017-04-15]. https://aws.amazon.com/ec2/. [26] Apache. Apache Spark:lightning-fast cluster computing[EB/OL].[2017-04-15]. https://spark.apache.org/. [27] Docker, Inc. Docker-Build, ship, and run[EB/OL].[2017-04-15]. https://www.docker.com/. [28] Ganglia. Ganglia monitoring system[EB/OL].[2017-04-15]. http://ganglia.sourceforge.net/. [29] YAN Y, GAO Y, CHEN Y, et al. TR-Spark:transient computing for big data analytics[C]//SoCC 2016:Proceeding of the 2016 Seventh ACM Symposium on Cloud Computing. New York:ACM, 2016:484-496. [30] Graphite. Graphite documentation[DB/OL].[2017-03-14]. https://graphite.readthedocs.io/. [31] Apache. Apache thrift-home[EB/OL].[2017-02-17]. https://thrift.apache.org/. [32] GitHub, Inc. Intel-Hadoop/Hibench[EB/OL].[2017-03-30]. https://github.com/intel-hadoop/HiBench/. |