计算机应用 ›› 2017, Vol. 37 ›› Issue (10): 2983-2990.DOI: 10.11772/j.issn.1001-9081.2017.10.2983

• 应用前沿、交叉与综合 • 上一篇    下一篇

面向城市基础设施智慧管养的大数据智能融合方法

刘佳俊1,2, 喻钢1,2, 胡珉1,2   

  1. 1. 上海大学悉尼工商学院, 上海 201800;
    2. 上海大学-上海城建建筑产业化研究中心, 上海 200072
  • 收稿日期:2017-04-11 修回日期:2017-06-24 出版日期:2017-10-10 发布日期:2017-10-16
  • 通讯作者: 喻钢(1977-),男,江西南昌人,博士,讲师,主要研究方向:地下工程信息化施工管理,E-mail:yugang509@163.com
  • 作者简介:刘佳俊(1994-),男,湖北荆州人,硕士研究生,主要研究方向:数据仓库、ETL建模;喻钢(1977-),男,江西南昌人,博士,讲师,主要研究方向:地下工程信息化施工管理;胡珉(1970-),女,上海人,副教授,博士,主要研究方向:建筑信息化.
  • 基金资助:
    上海市城乡建设和交通委员会建管项目(2014-009-002);上海市科委重点项目(13511504803);上海市国资委重大科研项目(2014008)。

Intelligent integration approach of big data for urban infrastructure management and maintenance

LIU Jiajun1,2, YU Gang1,2, HU Min1,2   

  1. 1. SHU-UTS SILC Business School, Shanghai 201800, China;
    2. Shanghai University-Shanghai Urban Construction Group Research Center of Building Industrialization, Shanghai 200072, China
  • Received:2017-04-11 Revised:2017-06-24 Online:2017-10-10 Published:2017-10-16
  • Supported by:
    This work is partially supported by the Construction Project of Shanghai Urban-Rural Construction and Transportation Committee (2014-009-002); the Key Project of Shanghai Committee of Science and Technology (13511504803); the Major Project of Shanghai SASAC (2014008).

摘要: 针对运维大数据维度高、形式多样化和变化迅速等特性,为提高数据融合效率以及平台的数据统计和决策分析性能,降低抽取-转换-加载(ETL)执行时间开销和数据中心负担,面向智慧管养需求提出一种多层次任务调度(MTS)ETL框架(MTS-ETL)。首先,将数据仓库分为数据临时区、数据仓储区、数据分类区和数据分析区,并根据所分区域将完整的ETL过程划分为4个层次的ETL任务调度环节,同时设计了多频率ETL运行调度以及顺序和非顺序两种ETL工作模式;接着,基于MTS-ETL框架的非顺序工作模式进行数据融合的概念建模、逻辑建模和物理建模;最后,利用Pentaho Data Integration设计ETL转换模块和工作模块以实现数据融合方法。在交通流量数据融合实验中,该方法融合136754条数据的时间仅为28.4s;在千量级的数据融合实验中比传统ETL方法的总平均执行时间降低了6.51%;报表分析结果表明其在融合400万条数据时依然能保证ETL过程的可靠性。所提方法能够有效融合运维大数据,提高平台统计分析性能,并维持ETL执行时间开销在较低水平。

关键词: 大数据, 抽取-转换-加载, 数据融合, 数据仓库, 城市基础设施管养

Abstract: In order to improve the efficiency of data integration, enhance both statistical and decisional analysis performance of the platform and reduce Extract-Transform-Load (ETL) execution time and the burden of data center, according to the operation and maintenance big data with characteristics of high dimension, diversity and variability, a Multilevel Task Scheduling (MTS) ETL framework (MTS-ETL) was proposed for intelligent maintenance requirements. Firstly, the data warehouse was divided into a series of parts, including data temporary area, data storage area, data classification area and data analysis area. In the light of the sub-region, the integral ETL process was divided into four levels of ETL task scheduling. Moreover, the multi-frequency ETL operation scheduling and sequential and non-sequential ETL working modes were designed at the same time. Secondly, the conceptual modelling, logical modelling and physical modelling of data integration were implemented based on the non-sequential mode of MTS-ETL framework. Finally, the ETL transformation module and job module were designed by using Pentaho Data Integration to realize this data integration method. In the traffic flow data integration experiment, the method integrated 136754 data for only 28.4 seconds, and reduced the total average execution time by 6.51% compared to the traditional ETL method in a thousand-scale data integration experiment. The reliability of ETL process was proved by the report analysis results of integrating 4 million data. The proposed method can effectively integrate the operation and maintenance of big data, improve the statistical analysis performance of platform and maintain ETL execution time at a low level.

Key words: big data, Extract-Transform-Load (ETL), data integration, data warehouse, urban infrastructure management and maintenance

中图分类号: