计算机应用 ›› 2019, Vol. 39 ›› Issue (1): 87-92.DOI: 10.11772/j.issn.1001-9081.2018071665

• 2018年全国开放式分布与并行计算学术年会(DPCS 2018)论文 • 上一篇    下一篇

面向高速公路大数据的短时流量预测方法

王雪菲1,2, 丁维龙1,2   

  1. 1. 北方工业大学 计算机学院, 北京 100144;
    2. 大规模流数据集成与分析技术北京市重点实验室(北方工业大学), 北京 100144
  • 收稿日期:2018-07-19 修回日期:2018-08-21 出版日期:2019-01-10 发布日期:2019-01-21
  • 通讯作者: 王雪菲
  • 作者简介:王雪菲(1994-),女,重庆云阳人,硕士研究生,主要研究方向:实时数据处理、智能交通;丁维龙(1983-),男,山东泰安人,讲师,博士,CCF会员,主要研究方向:实时数据处理、分布式系统。
  • 基金资助:
    国家自然科学基金资助项目(61702014);北京市自然科学基金资助项目(4162021);交通运输部公路科学研究所基本科研业务费重点项目(2016-9027)。

Short-term traffic prediction method on big data in highway domain

WANG Xuefei1,2, DING Weilong1,2   

  1. 1. College of Computer Science, North China University of Technology, Beijing 100144, China;
    2. Beijing Key Laboratory on Integration and Analysis of Large-scale Stream Data(North China University of Technology), Beijing 100144, China
  • Received:2018-07-19 Revised:2018-08-21 Online:2019-01-10 Published:2019-01-21
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61702014), the Natural Science Foundation of Beijing (4162021), the Key Fundamental Research Funds for the Research Institute of Highway, Ministry of Transport (2016-9027).

摘要: 针对高速公路传统的短时交通流预测方法适用数据规模小、全网预测效率较低、数据的时空关系被忽视等问题,提出一种结合了K近邻(KNN)模型且面向高速大数据的短时交通流预测方法。首先,对模型的K值和距离度量进行调优,利用交叉验证进行模型参数的对比实验;然后,考虑数据内在的业务时空关联,建模基于时空特性的特征向量;最后,在大数据环境下建立回归预测模型,以最优参数的模型实现预测。实验结果表明,与传统时间序列模型相比,所提方法一次可预测出全站点的流量,单次运行速度快,效率提高了77%,平均绝对百分比误差(MAPE)和绝对百分比误差中位数(MDAPE)均有明显减低,且具有良好的水平扩展性。

关键词: 交通流量, 短时预测, K近邻, 时空数据, 大数据

Abstract: Aiming at the problems that traditional short-time traffic flow prediction method in highway domain is suitable for small scale data, which limits the efficiency on massive data, and the spatio-temporal relationship of data is neglected, a short-term traffic flow prediction method for big data with combining K-Nearest Neighbors (KNN) in highway domain was proposed. Firstly, the K value and distance metric of model were tuned, and the model parameters were compared by using cross validation. Secondly, considering inherent spatio-temporal association of data, feature vectors based on spatio-temporal characteristics were modeled. Finally, a regression prediction model was established under big data environment, and the prediction was realized with the model of optimal parameters. The experimental results show that compared with traditional time series model, the proposed model works on all toll stations at one time, has high speed of single running and improves the efficiency by 77%. The method significantly reduces Mean Absolute Percentage Error (MAPE) and Median Absolute Percentage Error (MDAPE) and it also has good horizontal expansibility.

Key words: traffic flow, short-term forecasting, K Nearest Neighbors (KNN), spatio-temporal data, big data

中图分类号: