Review of typical machine learning platforms for big data

doi:10.11772/j.issn.1001-9081.2017.11.3039

Abstract

Abstract: Due to the volume, complex and fast-changing characteristics of big data, traditional machine learning platforms are not applicable. Therefore, designing an efficient and general machine learning platform for big data has become an important research issue. By introducing and analyzing the characteristics of machine learning algorithms and the data and model parallelization for large-scale machine learning, some common parallel computing models were presented. Bulk Synchronous Parallel (BSP), Stale Synchronous Parallel (SSP) computing models and the differences between BSP, SSP, and Asynchronous Parallel model (AP) were introduced. Then the typical machine learning platforms based on these parallel models and the advantages and disadvantages of these platforms were mainly introduced, and what kind of big data each typical machine learning platform was best suited for was pointed out. Finally, the typical machine learning platforms were summarized from the aspects of abstract data structure, parallel computing model and fault tolerance mechanism. Some suggestions and prospects were put forward.

Key words: big data, machine learning platform, parallel computing model, parameter server

摘要： 由于大数据海量、复杂多样、变化快，传统的机器学习平台已不再适用，因此，设计一个高效的、通用的大数据机器学习平台成为目前的研究热点。通过介绍和分析机器学习算法的特点以及大规模机器学习的数据和模型并行化，引出常见的并行计算模型。简单介绍了整体同步并行模型（BSP）、SSP并行计算模型以及BSP、SSP模型与AP模型的区别，主要介绍了基于这些并行模型的典型的机器学习平台和这些平台的优缺点，并指出各个平台最适合处理何种大数据问题。最后从采用的抽象数据结构、并行计算模型、容错机制等方面对典型的机器学习平台进行了总结，并提出一些建议和展望。

关键词: 大数据, 机器学习平台, 并行计算模型, 参数服务器

CLC Number:

TP311

JIAO Jiafeng, LI Yun. Review of typical machine learning platforms for big data[J]. Journal of Computer Applications, 2017, 37(11): 3039-3047.

焦嘉烽, 李云. 大数据下的典型机器学习平台综述[J]. 计算机应用, 2017, 37(11): 3039-3047.

References

[1] 王庆先, 孙世新, 尚明生,等. 并行计算模型研究[J]. 计算机科学, 2004, 31(9):128-131.(WANG Q X, SUN S X, SHANG M S, et al. Research on parallel computing model[J]. Computer Science, 2004, 21(9):128-131.)
[2] 王欢, 都志辉. 并行计算模型对比分析[J]. 计算机科学, 2005, 32(12):142-145.(WANG H, DU Z H. Contrastive analysis of parallel computation model[J]. Computer Science, 2005, 32(12):142-145.)
[3] 涂碧波, 邹铭, 詹剑锋,等. 多核处理器机群Memory层次化并行计算模型研究[J]. 计算机学报, 2008, 31(11):1948-1955.(TU B B, ZOU M, ZHAN J F, et al. Research on parallel computation model with memory hierarchy on multi-core cluster[J]. Chinese Journal of Computers, 2008, 31(11):1948-1955.)
[4] 刘方爱, 刘志勇, 乔香珍. 一种异步BSP模型及其程序优化技术[J]. 计算机学报, 2002, 25(4):373-380. (LIU F A, LIU Z Y, QIAO X Z. An asynchronous BSP model and optimization techniques[J]. Chinese Journal of Computers, 2002, 25(4):373-380.)
[5] VALIANT L G. A bridging model for parallel computation[J]. Communications of the ACM, 1990, 33(8):103-111.
[6] CIPAR J, HO Q, KIM J K, et al. Solving the straggler problem with bounded staleness[C]//Proceedings of the 14th USENIX Conference on Hot Topics in Operating Systems. Berkeley, CA:USENIX Association, 2013:Article No. 22.
[7] 黄宜华. 大数据机器学习系统研究进展[J]. 大数据, 2015, 1(1):28-47.(HUANG Y H. Research progress on big data machine learning system[J]. Big Data Research, 2015, 1(1):28-47.)
[8] 何清, 李宁, 罗文娟,等. 大数据下的机器学习算法综述[J]. 模式识别与人工智能, 2014, 27(4):327-336.(HE Q, LI N, LUO W J, et al. A survey of machine learning algorithms for big data[J]. Pattern Recognition and Artificial Intelligence, 2014, 27(4):327-336.)
[9] BOTTOU L. Large-scale machine learning with stochastic gradient descent[C]//Proceedings of the 19th International Conference on Computational Statistics Paris France. Berlin:Springer, 2010:177-186.
[10] FERCOQ O, RICHTÁRIK P. Accelerated, parallel and proximal coordinate descent[J]. SIAM Journal on Optimization, 2014, 25(4):1997-2023.
[11] BLEI D M, KUCUKELBIR A, MCAULIFFE J D. Variational inference:a review for statisticians[EB/OL].[2016-11-20]. https://www.cse.iitk.ac.in/users/piyush/courses/pml_winter16/VI_Review.pdf.
[12] XING E P, HO Q, XIE P, et al. Strategies and principles of distributed machine learning on big data[J]. Engineering Sciences, 2016, 2(2):179-195.
[13] RUDER S. An overview of gradient descent optimization algorithms[EB/OL].[2016-11-20]. http://128.84.21.199/pdf/1609.04747.pdf.
[14] DUCHI J, HAZAN E, SINGER Y. Adaptive subgradient methods for online learning and stochastic optimization[J]. Journal of Machine Learning Research, 2011, 12(7):2121-2159.
[15] ZEILER M D. ADADELTA:an adaptive learning rate method[EB/OL].[2016-11-20]. http://www.matthewzeiler.com/wp-content/uploads/2017/07/googleTR2012.pdf.
[16] 郝树魁. Hadoop HDFS和MapReduce架构浅析[J]. 邮电设计技术, 2012(7):37-42.(HAO S K. Brief analysis of the architecture of Hadoop HDFS and MapReduce[J]. Designing Techniques of Posts and Telecommunications, 2012(7):37-42.)
[17] HO Q, CIPAR J, CUI H, et al. More effective distributed ML via a stale synchronous parallel parameter server[C]//Proceedings of the 26th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc., 2013:1223-1231.
[18] LOW Y, GONZALEZ J E, KYROLA A, et al. GraphLab:a new framework for parallel machine learning[EB/OL].[2016-11-20]. http://wwwdb.inf.tu-dresden.de/misc/SS15/PSHS/paper/GraphLab/Low_2010.pdf.
[19] LOW Y, BICKSON D, GONZALEZ J, et al. Distributed GraphLab:a framework for machine learning and data mining in the cloud[J]. Proceedings of the VLDB Endowment, 2012, 5(8):716-727.
[20] CHU C T, KIM S K, LIN Y A, et al. Map-Reduce for machine learning on multicore[C]//Proceedings of the 19th International Conference on Neural Information Processing Systems. Cambridge, MA:MIT Press, 2006:281-288.
[21] DEAN J, GHEMAWAT S. MapReduce:simplified data processing on large clusters[C]//Proceedings of the 6th USENIX Conference on Symposium on Opearting Systems Design and Implementation. Berkeley, CA:USENIX Association, 2004:Article No. 10.
[22] ZAHARIA M, CHOWDHURY M, FRANKLIN M J, et al. Spark:cluster computing with working sets[C]//Proceedings of the 2nd USENIX Conference on Hot Topics in Cloud Computing. Berkeley, CA:USENIX Association, 2010:Article No. 10.
[23] ZAHARIA M, CHOWDHURY M, DAS T, et al. Resilient distributed datasets:a fault-tolerant abstraction for in-memory cluster computing[C]//Proceedings of the 9th USENIX Conference on Networked Systems Design and Implementation. Berkeley, CA:USENIX Association, 2012:Article No. 2.
[24] XING E P, HO Q, DAI W, et al. Petuum:a new platform for distributed machine learning on big data[J]. IEEE Transactions on Big Data, 2015, 1(2):49-67.
[25] SMOLA A, NARAYANAMURTHY S. An architecture for parallel topic models[J]. Proceedings of the VLDB Endowment, 2010, 3(1/2):703-710.
[26] AHMED A, ALY M, GONZALEZ J, et al. Scalable inference in latent variable models[C]//Proceedings of the 5th ACM International Conference on Web Search and Data Mining. New York:ACM, 2012:123-132.
[27] DAI W, KUMAR A, WEI J, et al. High-performance distributed ML at scale through parameter server consistency models[C]//Proceedings of the 29th AAAI Conference on Artificial Intelligence. Menlo Park, CA:AAAI Press, 2015:79-87.
[28] DEAN J, CORRADO G S, MONGA R, et al. Large scale distributed deep networks[C]//Proceedings of the 25th International Conference on Neural Information Processing Systems. Lake Tahoe, Nevada:Curran Associates Inc., 2012:1223-1231.
[29] DAI W, WEI J, ZHENG X, et al. Petuum:a framework for iterative-convergent distributed ML[EB/OL].[2016-11-20]. http://www.u.arizona.edu/~junmingy/papers/Dai-etal-NIPS13.pdf.
[30] LI M, ZHOU L, YANG Z, et al. Parameter server for distributed machine learning[EB/OL].[2016-11-20]. http://www-cgi.cs.cmu.edu/~muli/file/ps.pdf.
[31] LI M, ANDERSEN D G, PARK J W, et al. Scaling distributed machine learning with the parameter server[C]//Proceedings of the 11th USENIX Conference on Operating Systems Design and Implementation. Berkeley, CA:USENIX Association, 2014:583-598.
[32] KARGER D, LEHMAN E, LEIGHTON T, et al. Consistent hashing and random trees:distributed caching protocols for relieving hot spots on the world wide Web[C]//Proceedings of the 29th ACM Symposium on Theory of Computing. New York:ACM, 1997:654-663.
[33] BYERS J, CONSIDINE J, MITZENMACHER M. Simple load balancing for distributed hash tables[C]//Proceedings of the 2nd International Workshop Peer-to-Peer Systems Ⅱ. Berlin:Springer, 2003:80-87.
[34] CHOUDHARI R, JAGADISH D. Paxos made simple[J]. ACM SIGACT News, 2001, 32(4):51-58.
[35] DECANDIA G, HASTORUN D, JAMPANI M, et al. Dynamo:Amazon's highly available key-value store[C]//Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles. New York:ACM, 2007:205-220.