计算机应用 ›› 2015, Vol. 35 ›› Issue (5): 1267-1272.DOI: 10.11772/j.issn.1001-9081.2015.05.1267

• 先进计算 • 上一篇    下一篇

物联网环境下基于上下文的Hadoop大数据处理系统模型

李敏1, 倪少权1,2, 邱小平1, 黄强1,2   

  1. 1. 西南交通大学 交通运输与物流学院, 成都 610031;
    2. 西南交通大学 全国铁路列车运行图编制研发培训中心, 成都 610031
  • 收稿日期:2014-12-10 修回日期:2015-01-17 出版日期:2015-05-10 发布日期:2015-05-14
  • 通讯作者: 李敏
  • 作者简介:李敏(1981-),女,四川成都人,工程师,博士研究生,主要研究方向:物流信息化、管理信息系统、物联网; 倪少权(1967-),男,湖北汉川人,教授,博士,主要研究方向:计算机编制列车运行图、交通运输信息、铁路行车组织、物流信息; 邱小平(1976-),男,四川营山人,教授,博士,主要研究方向:供应链信息管理与集成、交通运输规划与管理; 黄强(1981-),男,四川雅安人, 助教,博士研究生,主要研究方向:物流信息化、管理信息系统、云计算、铁水联运.
  • 基金资助:

    国家自然科学基金资助项目(61273242,61403317);中国铁路总公司科技研究计划项目(2013X006-A,2013X014-G,2013X010-A,2014X004-D).

Hadoop big data processing system model based on context-queue under Internet of things

LI Min1, NI Shaoquan1,2, QIU Xiaoping1, HUANG Qiang1,2   

  1. 1. School of Transportation and Logistics, Southwest Jiaotong University, Chengdu Sichuan 610031, China;
    2. National Railway Train Diagram Research and Training Center, Southwest Jiaotong University, Chengdu Sichuan 610031, China
  • Received:2014-12-10 Revised:2015-01-17 Online:2015-05-10 Published:2015-05-14

摘要:

针对物联网环境下异构大数据处理实时性低的问题,探讨了基于Hadoop框架实现数据处理与持久化的方法,提出了一种基于"上下文"的Hadoop大数据处理系统模型HDS,HDS利用Hadoop框架完成数据并行处理与持久化,将物联网环境下异构数据抽象为"上下文"作为HDS处理对象;并提出了"上下文距离""上下文邻域系统(CNS)"的定义;对于Hadoop框架本身数据处理实时性不高的问题,HDS在设计上增加了"上下文队列(CQ)"作为辅助存储来提高数据处理实时性;利用"上下文"的时空特性,建立了用户请求"上下文邻域系统"对任务进行重组.以成品油配送车辆调度问题为例,利用MapReduce并行实验对HDS的数据处理与实时性能进行了验证与分析.实验结果表明,在物联网环境下,HDS不仅在大数据处理性能上较传统单点处理模型(SDS)具有明显优势,在实验环境中10台服务器的情况下,其计算性能能够超过SDS 200倍以上;同时也验证了CQ作为辅助存储能够有效提高数据处理实时性,在10台服务器环境下,其数据处理实时性能够提高270倍以上.

关键词: 大数据, 物联网, Hadoop, 上下文邻域系统, 上下文队列

Abstract:

In order to solve problems that heterogeneous big data processing has low real-time response capability in Internet Of Things (IOT), data processinging and persistence schemes based on Hadoop were analyzed. A model of Hadoop big data processing system model based on "Context" named as HDS (Hadoop big Data processing System) was proposed. This model used Hadoop framework to complete data parallel process and persistence. Heterogeneous data were abstracted as "Context" which are the unified objects processed in HDS. Definitions of "Context Distance" and "Context Neighborhood System (CNS)" were proposed based on the "temporal-spatial" characteristics of "Context". "Context Queue (CQ)" was designed as an assistance storage so as to overcome defect of low real-time data processing response capability in Hadoop framework. Especially, based on temporal and spatial characteristics of context, optimization of task reorganizing in client requests CQ was introduced in detail. Finally, taken problem of vehicle scheduling in petroleum products distribution as an example, performance of data processing and real-time response capability were tested by MapReduce distributed parallel computing experiments. The experimental results show that compared with ordinary computing system SDS (Single Data processing System), HDS is not only of obviously excellence in big data processing capability but also can effectively overcome defect of low real-time data processing response of Hadoop. In 10-server experimental environment, the difference of data processinging capability between HDS and SDS is more than 200 times; the difference between HDS with and without assistance of CQ for real-time data processing response capability is more than 270 times.

Key words: big data, Internet Of Things (IOT), Hadoop, Context Neighborhood System (CNS), context-queue

中图分类号: