Hadoop big data processing system model based on context-queue under Internet of things
LI Min1, NI Shaoquan1,2, QIU Xiaoping1, HUANG Qiang1,2
1. School of Transportation and Logistics, Southwest Jiaotong University, Chengdu Sichuan 610031, China;
2. National Railway Train Diagram Research and Training Center, Southwest Jiaotong University, Chengdu Sichuan 610031, China
In order to solve problems that heterogeneous big data processing has low real-time response capability in Internet Of Things (IOT), data processinging and persistence schemes based on Hadoop were analyzed. A model of Hadoop big data processing system model based on "Context" named as HDS (Hadoop big Data processing System) was proposed. This model used Hadoop framework to complete data parallel process and persistence. Heterogeneous data were abstracted as "Context" which are the unified objects processed in HDS. Definitions of "Context Distance" and "Context Neighborhood System (CNS)" were proposed based on the "temporal-spatial" characteristics of "Context". "Context Queue (CQ)" was designed as an assistance storage so as to overcome defect of low real-time data processing response capability in Hadoop framework. Especially, based on temporal and spatial characteristics of context, optimization of task reorganizing in client requests CQ was introduced in detail. Finally, taken problem of vehicle scheduling in petroleum products distribution as an example, performance of data processing and real-time response capability were tested by MapReduce distributed parallel computing experiments. The experimental results show that compared with ordinary computing system SDS (Single Data processing System), HDS is not only of obviously excellence in big data processing capability but also can effectively overcome defect of low real-time data processing response of Hadoop. In 10-server experimental environment, the difference of data processinging capability between HDS and SDS is more than 200 times; the difference between HDS with and without assistance of CQ for real-time data processing response capability is more than 270 times.
李敏, 倪少权, 邱小平, 黄强. 物联网环境下基于上下文的Hadoop大数据处理系统模型[J]. 计算机应用, 2015, 35(5): 1267-1272.
LI Min, NI Shaoquan, QIU Xiaoping, HUANG Qiang. Hadoop big data processing system model based on context-queue under Internet of things. Journal of Computer Applications, 2015, 35(5): 1267-1272.
[1] XUE Y, GARRET S. Oracle in-database Hadoop: When MapReduce meets RDBMS[C]// Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2012:779-789. [2] CUI J, LI T, LAN H. Design and development of the mass data storage platform based on Hadoop[J]. Journal of Computer Research and Development, 2012, 49(Sl):12-18.(崔杰,李陶深,兰红星. 基于Hadoop的海量数据存储平台设计与开发[J]. 计算机研究与发展, 2012,49(Sl): 12-18.) [3] SHI D. Cloud queue: an Internet scale messaging infrastructure based on Hadoop[D]. Shanghai: Donghua University, 2012.(史冬冬.云队列. 一个基于Hadoop的大规模消息基础平台[D]. 上海:东华大学,2012.) [4] ABHISHEK V, UDMILA C, CAMPBELL R H. Two sides of a coin: optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance[C]// Proceedings of the 20th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. Washington, DC: IEEE Computer Society, 2012:12-18. [5] ZHENG D, WANG J, BEN K. Research on component adaptation model supporting context-aware[J].Computer Engineering,2012,38(2):39-41.(郑笛,王俊,贲可荣. 支持上下文感知的构件适配模型研究[J]. 计算机工程, 2012,38(2):39-41.) [6] WANG X, SARMA A, OLSTON C. CoScan: cooperative scan sharing in the cloud[C/OL].[2014-06-20]. http://paperhub.s3.amazonaws.com/d7c86e6da622b0ffc7fadf5e16241d3c.pdf. [7] ZHU Z. Research application of massive data processing model based on Hadoop[D]. Beijing: Beijing University of Posts and Telecommunications,2008. (朱珠.基于Hadoop的海量数据处理模型研究和应用[D]. 北京: 北京邮电大学, 2008.) [8] WANG X, SUN H. Research of optimizing multiway joins based on MapReduce[J]. Computer Technology and Development, 2013,23(6):59-66.(王晓军,孙惠. 基于MapReduce的多路连接优化方法研究[J]. 计算机技术与发展,2013,23(6):59-66.) [9] KOEHLER M, KANIOVSKYI Y, BENKNER S. An adaptive framework for the execution of data-intensive MapReduce applications in the cloud[C]// Proceedings of the 1st International Workshop on Data Intensive Computing in the Clouds. Piscataway: IEEE,2011:1122-1131. [10] HUANG Z,CAO F,LI J, et al. Developing sea cloud data system key technologies for large data analysis and mining[J]. Journal of Network New Media,2012,1(6):20-26.(黄哲学,曹付元, 李俊杰, 等.面向大数据的海云数据系统关键技术研究[J].网络新媒体技术, 2012,1(6):20-26.) [11] SU W, LI J, LIU H, et al. Design method of GIS spatio-temporal data model based on MapReduce[J]. Geomatics and Spatial Information Technology, 2013,36(7):41-44.(苏韦,李景文, 刘华尧,等.基于MapReduce的时空数据模型设计方法[J]. 测绘与空间地理信息,2013,36(7):41-44.) [12] ZHANG Y, WU L, DENG W, et al. Combing temporal and spatial context for sketched graphical/textual stroke classification[J]. Journal of Electronics and Information Technology, 2013,35(1):113-118.(张友根, 吴玲达,邓维,等. 融合时空上下文的手绘笔画图文分类[J]. 电子与信息学报,2013,35(1):113-118.) [13] QI M, DING G, ZHOU Y, et al. Vehicle routing problem with time windows based on spatiotemporal distance[J]. Journal of Transportation Systems Engineering and Information Technology, 2011,11(2):85-89.(戚铭尧,丁国祥,周游,等. 一种基于时空距离的带时间窗车辆路径问题算法[J]. 交通运输系统工程与信息, 2011,11(2):85-89.) [14] JEFFREY D, SANJAY G. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM,2008,51(1):107-113. [15] CAO Y. The research of performance optimization of Hadoop in big data[D]. Dalian: Dalian Maritime University, 2013.(曹英.大数据环境下Hadoop性能优化的研究[D]. 大连: 大连海事大学,2013.) [16] LIAO C, SHIH J, CHANG R. Simplifying MapReduce data processing[J]. Journal of Computational Science and Engineering, 2013,8(3): 219-226. [17] YU X,HONG B. Bi-Hadoop: extending Hadoop to improve support for binary-input applications[C]// Proceedings of the 13th IEEE International Symposium on Cluster, Cloud, and Grid Computing. Piscataway: IEEE, 2013:245-252. [18] LI C, ZHANG X, JIN H, et al. MapReduce: a new programming model for distributed parallel computing[J]. Computer Engineering and Science,2011,33(3):129-135.(李成华,张新访,金海,等. MapReduce:新型的分布式并行计算编程模型[J].计算机工程与科学,2011,33(3):129-135.) [19] YANG Y, LONG X. Impacts of virtualization technologies on Hadoop[C]// Proceedings of the 2013 3rd International Conference on Intelligent System Design and Engineering Applications. Piscataway: IEEE,2013:846-849. [20] XIE G, LUO S. Study on application of MapReduce model based on Hadoop[J].Microcomputer and Its Application, 2012,33(8):4-7.(谢桂兰, 罗省贤.基于Hadoop MapReduce模型的应用研究[J].微型机与应用,2012,33(8):4-7.) [21] HE W. Research of geological informationization based on IoT and cloud computing in big data era[D]. Changchun: Jilin University, 2013.(何文娜.大数据时代基于物联网和云计算的地质信息化研究[D]. 长春: 吉林大学, 2013.)