Journal of Computer Applications ›› 2019, Vol. 39 ›› Issue (3): 924-929.DOI: 10.11772/j.issn.1001-9081.2018081681

Previous Articles     Next Articles

Abnormal time series data detection of gas station by Seq2Seq model based on bidirectional long short-term memory

TAO Tao1,2,3, ZHOU Xi1,3, MA Bo1,3, ZHAO Fan1,3   

  1. 1. Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi Xinjiang 830011, China;
    2. University of Chinese Academy of Sciences, Beijing 100049, China;
    3. Xinjiang Laboratory of Minority Speech and Language Information Processing, Xinjiang Technical Institute of Physics and Chemistry, Xinjiang Urumqi 830011, China
  • Received:2018-08-14 Revised:2018-09-13 Online:2019-03-10 Published:2019-03-11
  • Supported by:

    This work is partially supported by the Program of Introducing High-Level Talents of Xinjiang(Y639401201), the West Light Foundation of Chinese Academy of Sciences (2016-QNXZ-A-3).

基于双向LSTM的Seq2Seq模型在加油站时序数据异常检测中的应用

陶涛1,2,3, 周喜1,3, 马博1,3, 赵凡1,3   

  1. 1. 中国科学院 新疆理化技术研究所, 乌鲁木齐 830011;
    2. 中国科学院大学, 北京 100049;
    3. 新疆理化技术研究所 新疆民族语音语言信息处理实验室, 乌鲁木齐 830011
  • 通讯作者: 周喜
  • 作者简介:陶涛(1994-),男,贵州毕节人,硕士研究生,主要研究方向:大数据分析、数据挖掘;周喜(1978-),男,湖南双峰人,研究员,博士,CCF会员,主要研究方向:物联网、大数据分析;马博(1984-),男,辽宁鞍山人,副研究员,博士,CCF会员,主要研究方向:数据分析与知识发现、机器学习;赵凡(1980-),男,山西介休人,副研究员,博士研究生,CCF会员,主要研究方向:信息安全、大数据分析。
  • 基金资助:

    新疆维吾尔自治区高层次人才引进工程资助项目(Y639401201);中国科学院西部之光项目(2016-QNXZ-A-3)。

Abstract:

Time series data of gas station contains multi-dimensional information of fueling behavior, but the data of specific gas station are sparse. The existing abnormal data detection algorithms are not suitable for gas station time series data, because many pseudo outliers are mined and many real abnormal points are missed. To solve the problems, an abnormal detection method based on deep learning was proposed to detect vehicles with abnormal fueling. Firstly, feature extraction was performed on data collected from the gas station through an automatic encoder. Then, a deep learning model Seq2Seq with embedding Bidirectional Long Short-Term Memory (Bi-LSTM) was used to predict the fueling behavior. Finally, the threshold of outliers was defined by comparing the predicted value and the original value. The experiments on a fueling dataset and a credit card fraud dataset verify the effectiveness of the proposed method. Compared with the existing methods, the Root Mean Squared Error (RMSE) of the proposed method is decreased by 21.1% on the fueling dataset, and abnormal detection accuracy of the proposed method is improved by 1.4% on the credit card fraud dataset. Therefore, the proposed method can be applied to detect vehicles with abnormal fueling behavior, improving the management and operational efficiency of gas station.

Key words: gas station time-serise data, deep learning, Seq2Seq, Bidirectional Long Short-Term Memory (Bi-LSTM), outlier detection

摘要:

加油时序数据包含加油行为的多维信息,但是指定加油站点数据较为稀疏,现有成熟的数据异常检测算法存在挖掘较多假性异常点以及遗漏较多真实异常点的缺陷,并不适用于挖掘加油站时序数据。提出一种基于深度学习的异常检测方法识别加油异常车辆,首先通过自动编码器对加油站点采集到的相关数据进行特征提取,然后采用嵌入双向长短期记忆(Bi-LSTM)的Seq2Seq模型对加油行为进行预测,最后通过比较预测值和原始值来定义异常点的阈值。通过在加油数据集以及信用卡欺诈数据集上的实验验证了该方法的有效性,并且相对于现有方法在加油数据集上均方根误差(RMSE)降低了21.1%,在信用卡欺诈数据集上检测异常的准确率提高了1.4%。因此,提出的模型可以有效应用于加油行为异常的车辆检测,从而提高加油站的管理和运营效率。

关键词: 加油站时序数据, 深度学习, Seq2Seq, 双向长短期记忆, 异常检测

CLC Number: