物联网环境下基于上下文的Hadoop大数据处理系统模型

doi:10.11772/j.issn.1001-9081.2015.05.1267

计算机应用 ›› 2015, Vol. 35 ›› Issue (5): 1267-1272.DOI: 10.11772/j.issn.1001-9081.2015.05.1267

物联网环境下基于上下文的Hadoop大数据处理系统模型

李敏¹, 倪少权^1,2, 邱小平¹, 黄强^1,2

1. 西南交通大学交通运输与物流学院, 成都 610031;
2. 西南交通大学全国铁路列车运行图编制研发培训中心, 成都 610031

收稿日期:2014-12-10 修回日期:2015-01-17 出版日期:2015-05-10 发布日期:2015-05-14
通讯作者: 李敏
作者简介:李敏(1981-),女,四川成都人,工程师,博士研究生,主要研究方向:物流信息化、管理信息系统、物联网; 倪少权(1967-),男,湖北汉川人,教授,博士,主要研究方向:计算机编制列车运行图、交通运输信息、铁路行车组织、物流信息; 邱小平(1976-),男,四川营山人,教授,博士,主要研究方向:供应链信息管理与集成、交通运输规划与管理; 黄强(1981-),男,四川雅安人, 助教,博士研究生,主要研究方向:物流信息化、管理信息系统、云计算、铁水联运.
基金资助:
国家自然科学基金资助项目(61273242,61403317);中国铁路总公司科技研究计划项目(2013X006-A,2013X014-G,2013X010-A,2014X004-D).

Hadoop big data processing system model based on context-queue under Internet of things

LI Min¹, NI Shaoquan^1,2, QIU Xiaoping¹, HUANG Qiang^1,2

1. School of Transportation and Logistics, Southwest Jiaotong University, Chengdu Sichuan 610031, China;
2. National Railway Train Diagram Research and Training Center, Southwest Jiaotong University, Chengdu Sichuan 610031, China

Received:2014-12-10 Revised:2015-01-17 Online:2015-05-10 Published:2015-05-14

摘要/Abstract

摘要：

针对物联网环境下异构大数据处理实时性低的问题,探讨了基于Hadoop框架实现数据处理与持久化的方法,提出了一种基于"上下文"的Hadoop大数据处理系统模型HDS,HDS利用Hadoop框架完成数据并行处理与持久化,将物联网环境下异构数据抽象为"上下文"作为HDS处理对象;并提出了"上下文距离""上下文邻域系统(CNS)"的定义;对于Hadoop框架本身数据处理实时性不高的问题,HDS在设计上增加了"上下文队列(CQ)"作为辅助存储来提高数据处理实时性;利用"上下文"的时空特性,建立了用户请求"上下文邻域系统"对任务进行重组.以成品油配送车辆调度问题为例,利用MapReduce并行实验对HDS的数据处理与实时性能进行了验证与分析.实验结果表明,在物联网环境下,HDS不仅在大数据处理性能上较传统单点处理模型(SDS)具有明显优势,在实验环境中10台服务器的情况下,其计算性能能够超过SDS 200倍以上;同时也验证了CQ作为辅助存储能够有效提高数据处理实时性,在10台服务器环境下,其数据处理实时性能够提高270倍以上.

关键词: 大数据, 物联网, Hadoop, 上下文邻域系统, 上下文队列

Abstract:

In order to solve problems that heterogeneous big data processing has low real-time response capability in Internet Of Things (IOT), data processinging and persistence schemes based on Hadoop were analyzed. A model of Hadoop big data processing system model based on "Context" named as HDS (Hadoop big Data processing System) was proposed. This model used Hadoop framework to complete data parallel process and persistence. Heterogeneous data were abstracted as "Context" which are the unified objects processed in HDS. Definitions of "Context Distance" and "Context Neighborhood System (CNS)" were proposed based on the "temporal-spatial" characteristics of "Context". "Context Queue (CQ)" was designed as an assistance storage so as to overcome defect of low real-time data processing response capability in Hadoop framework. Especially, based on temporal and spatial characteristics of context, optimization of task reorganizing in client requests CQ was introduced in detail. Finally, taken problem of vehicle scheduling in petroleum products distribution as an example, performance of data processing and real-time response capability were tested by MapReduce distributed parallel computing experiments. The experimental results show that compared with ordinary computing system SDS (Single Data processing System), HDS is not only of obviously excellence in big data processing capability but also can effectively overcome defect of low real-time data processing response of Hadoop. In 10-server experimental environment, the difference of data processinging capability between HDS and SDS is more than 200 times; the difference between HDS with and without assistance of CQ for real-time data processing response capability is more than 270 times.

Key words: big data, Internet Of Things (IOT), Hadoop, Context Neighborhood System (CNS), context-queue

中图分类号:

TP302.1

李敏, 倪少权, 邱小平, 黄强. 物联网环境下基于上下文的Hadoop大数据处理系统模型[J]. 计算机应用, 2015, 35(5): 1267-1272.

LI Min, NI Shaoquan, QIU Xiaoping, HUANG Qiang. Hadoop big data processing system model based on context-queue under Internet of things[J]. Journal of Computer Applications, 2015, 35(5): 1267-1272.

参考文献

[1] XUE Y, GARRET S. Oracle in-database Hadoop: When MapReduce meets RDBMS[C]// Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. New York: ACM, 2012:779-789.
[2] CUI J, LI T, LAN H. Design and development of the mass data storage platform based on Hadoop[J]. Journal of Computer Research and Development, 2012, 49(Sl):12-18.(崔杰,李陶深,兰红星. 基于Hadoop的海量数据存储平台设计与开发[J]. 计算机研究与发展, 2012,49(Sl): 12-18.)
[3] SHI D. Cloud queue: an Internet scale messaging infrastructure based on Hadoop[D]. Shanghai: Donghua University, 2012.(史冬冬.云队列. 一个基于Hadoop的大规模消息基础平台[D]. 上海:东华大学,2012.)
[4] ABHISHEK V, UDMILA C, CAMPBELL R H. Two sides of a coin: optimizing the schedule of MapReduce jobs to minimize their makespan and improve cluster performance[C]// Proceedings of the 20th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems. Washington, DC: IEEE Computer Society, 2012:12-18.
[5] ZHENG D, WANG J, BEN K. Research on component adaptation model supporting context-aware[J].Computer Engineering,2012,38(2):39-41.(郑笛,王俊,贲可荣. 支持上下文感知的构件适配模型研究[J]. 计算机工程, 2012,38(2):39-41.)
[6] WANG X, SARMA A, OLSTON C. CoScan: cooperative scan sharing in the cloud[C/OL].[2014-06-20]. http://paperhub.s3.amazonaws.com/d7c86e6da622b0ffc7fadf5e16241d3c.pdf.
[7] ZHU Z. Research application of massive data processing model based on Hadoop[D]. Beijing: Beijing University of Posts and Telecommunications,2008. (朱珠.基于Hadoop的海量数据处理模型研究和应用[D]. 北京: 北京邮电大学, 2008.)
[8] WANG X, SUN H. Research of optimizing multiway joins based on MapReduce[J]. Computer Technology and Development, 2013,23(6):59-66.(王晓军,孙惠. 基于MapReduce的多路连接优化方法研究[J]. 计算机技术与发展,2013,23(6):59-66.)
[9] KOEHLER M, KANIOVSKYI Y, BENKNER S. An adaptive framework for the execution of data-intensive MapReduce applications in the cloud[C]// Proceedings of the 1st International Workshop on Data Intensive Computing in the Clouds. Piscataway: IEEE,2011:1122-1131.
[10] HUANG Z,CAO F,LI J, et al. Developing sea cloud data system key technologies for large data analysis and mining[J]. Journal of Network New Media,2012,1(6):20-26.(黄哲学,曹付元, 李俊杰, 等.面向大数据的海云数据系统关键技术研究[J].网络新媒体技术, 2012,1(6):20-26.)
[11] SU W, LI J, LIU H, et al. Design method of GIS spatio-temporal data model based on MapReduce[J]. Geomatics and Spatial Information Technology, 2013,36(7):41-44.(苏韦,李景文, 刘华尧,等.基于MapReduce的时空数据模型设计方法[J]. 测绘与空间地理信息,2013,36(7):41-44.)
[12] ZHANG Y, WU L, DENG W, et al. Combing temporal and spatial context for sketched graphical/textual stroke classification[J]. Journal of Electronics and Information Technology, 2013,35(1):113-118.(张友根, 吴玲达,邓维,等. 融合时空上下文的手绘笔画图文分类[J]. 电子与信息学报,2013,35(1):113-118.)
[13] QI M, DING G, ZHOU Y, et al. Vehicle routing problem with time windows based on spatiotemporal distance[J]. Journal of Transportation Systems Engineering and Information Technology, 2011,11(2):85-89.(戚铭尧,丁国祥,周游,等. 一种基于时空距离的带时间窗车辆路径问题算法[J]. 交通运输系统工程与信息, 2011,11(2):85-89.)
[14] JEFFREY D, SANJAY G. MapReduce: simplified data processing on large clusters[J]. Communications of the ACM,2008,51(1):107-113.
[15] CAO Y. The research of performance optimization of Hadoop in big data[D]. Dalian: Dalian Maritime University, 2013.(曹英.大数据环境下Hadoop性能优化的研究[D]. 大连: 大连海事大学,2013.)
[16] LIAO C, SHIH J, CHANG R. Simplifying MapReduce data processing[J]. Journal of Computational Science and Engineering, 2013,8(3): 219-226.
[17] YU X,HONG B. Bi-Hadoop: extending Hadoop to improve support for binary-input applications[C]// Proceedings of the 13th IEEE International Symposium on Cluster, Cloud, and Grid Computing. Piscataway: IEEE, 2013:245-252.
[18] LI C, ZHANG X, JIN H, et al. MapReduce: a new programming model for distributed parallel computing[J]. Computer Engineering and Science,2011,33(3):129-135.(李成华,张新访,金海,等. MapReduce:新型的分布式并行计算编程模型[J].计算机工程与科学,2011,33(3):129-135.)
[19] YANG Y, LONG X. Impacts of virtualization technologies on Hadoop[C]// Proceedings of the 2013 3rd International Conference on Intelligent System Design and Engineering Applications. Piscataway: IEEE,2013:846-849.
[20] XIE G, LUO S. Study on application of MapReduce model based on Hadoop[J].Microcomputer and Its Application, 2012,33(8):4-7.(谢桂兰, 罗省贤.基于Hadoop MapReduce模型的应用研究[J].微型机与应用,2012,33(8):4-7.)
[21] HE W. Research of geological informationization based on IoT and cloud computing in big data era[D]. Changchun: Jilin University, 2013.(何文娜.大数据时代基于物联网和云计算的地质信息化研究[D]. 长春: 吉林大学, 2013.)

物联网环境下基于上下文的Hadoop大数据处理系统模型

Hadoop big data processing system model based on context-queue under Internet of things

PDF

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics

[1]	包玉龙, 朱雪阳, 张文辉, 孙鹏飞, 赵颖琪. 物联网应用中访问控制智能合约的形式化验证[J]. 计算机应用, 2021, 41(4): 930-938.
[2]	田志宏, 赵金东. 面向物联网的区块链共识机制综述[J]. 计算机应用, 2021, 41(4): 917-929.
[3]	张凌哲, 黄向东, 乔嘉林, 勾王敏浩, 王建民. 面向时序数据的两阶段日志结构合并树文件合并框架[J]. 计算机应用, 2021, 41(3): 618-622.
[4]	李秀艳, 刘明曦, 史闻博, 董国芳. 面向资源受限用户的高效动态数据审计方案[J]. 计算机应用, 2021, 41(2): 422-432.
[5]	周翔, 翟俊海, 黄雅婕, 申瑞彩, 侯璎真. 基于随机森林和投票机制的大数据样例选择算法[J]. 计算机应用, 2021, 41(1): 74-80.
[6]	夏伦腾, 张莉. 基于K近邻和动态时间规整算法的盲人物联网手杖系统[J]. 计算机应用, 2020, 40(8): 2441-2448.
[7]	曹策俊, 刘桔. 灾害运作管理中应急组织决策建模方法综述[J]. 计算机应用, 2020, 40(7): 2142-2149.
[8]	朱小杰, 赵子豪, 杜一. 模型驱动的大数据流水线框架PiFlow[J]. 计算机应用, 2020, 40(6): 1638-1647.
[9]	程小辉, 牛童, 汪彦君. 基于序列模型的无线传感网入侵检测系统[J]. 计算机应用, 2020, 40(6): 1680-1684.
[10]	史锦山, 李茹, 松婷婷. 基于区块链的物联网访问控制框架[J]. 计算机应用, 2020, 40(4): 931-941.
[11]	刘向举, 刘鹏程, 徐辉, 朱晓娟. 基于软件定义物联网的分布式拒绝服务攻击检测方法[J]. 计算机应用, 2020, 40(3): 753-759.
[12]	王舒漫, 李爱萍, 段利国, 付佳, 陈永乐. 基于BTM的物联网服务发现方法[J]. 计算机应用, 2020, 40(2): 459-464.
[13]	吴文莉, 刘国华, 张君宝. 大数据上函数查询解答的复杂度分析[J]. 计算机应用, 2020, 40(2): 416-419.
[14]	董聪, 张晓, 程文迪, 石佳. 基于新型存储器件的分布式文件系统性能优化[J]. 计算机应用, 2020, 40(12): 3594-3603.
[15]	李孜颖, 石振国. 面向大数据任务的调度方法[J]. 计算机应用, 2020, 40(10): 2923-2928.