Journal of Computer Applications ›› 2015, Vol. 35 ›› Issue (11): 3112-3115.DOI: 10.11772/j.issn.1001-9081.2015.11.3112

• DPCS 2015 Paper • Previous Articles     Next Articles

Improved Kalman algorithm for abnormal data detection based on multidimensional impact factors

HUA Qing, XU Guoyan, ZHANG Ye   

  1. College of computer and information, Hohai University, Nanjing Jiangsu 211100, China
  • Received:2015-06-17 Revised:2015-07-24 Published:2015-11-13

基于多维滑窗的异常数据检测方法

花青, 许国艳, 张叶   

  1. 河海大学 计算机与信息学院, 南京211100
  • 通讯作者: 花青(1991-),男,江苏盐城人,硕士研究生,主要研究方向:大数据、数据起源.
  • 作者简介:许国艳(1971-),女,内蒙古赤峰人,副教授,博士,CCF会员,主要研究方向:大数据、数据起源; 张叶(1990-),女,山东泰安人,硕士研究生,主要研究方向:大数据、数据起源.
  • 基金资助:
    国家科技支撑计划项目(2013BAB06B04);江苏省自然科学基金资助项目(BK20130852);2013年江苏水利科技项目(2013025);中国华能集团公司总部科技项目(HNKJ13-H17-04).

Abstract: With the widespread application of the data flow, the abnormal data detection problem in data flow has caused more attention. Existing Kalman filtering algorithms need small amount of historical data, but they only apply to single abnormal point detection. The effect to complex continuous outlier points is poor. In order to solve the problem, a Kalman filtering algorithm based on multidimensional impact factors was proposed. The algorithm joined the three dimensions of impact factor as space, time, provenance as well. In case of different weather and flood season, the algorithm adjusted the controlling parameters of system model parameters, and got a more accurate estimate of measurement noise. The detection accuracy of the algorithm could be improved significantly. The experimental results show that under the premise of guaranteeing similar running time, the detection error rate of this algorithm is far lower than Amnesic Kalman Filtering (AKF) and Wavelet Kalman Filtering (WKF) algorithms.

Key words: abnormal data detection, data provenance, graded tagging model, multidimensional impact factor, Kalman algorithm

摘要: 随着数据流的广泛运用,数据流中异常数据的检测问题也引起了更多的关注.现有的卡尔曼滤波算法需要的历史数据量虽然小,但只适用于单个异常点的检测,对于复杂连续的异常值检测效果较差.针对这个问题,提出一种水文传感器分级标注模型,并在此基础上提出一种基于多维影响因子的卡尔曼滤波算法,加入空间、时间、起源三个维度的影响因子,在天气和汛期等影响因素改变时,对系统模型的控制参数进行适当调整,并且对测量噪声进行更加准确的估计,提高异常检测的准确性.实验结果证明,所提算法在保证运行时间相近的前提下,检测的错误率远低于基于遗忘因子的卡尔曼(AKF)算法和基于小波的卡尔曼(WKF)算法.

关键词: 异常数据检测, 数据起源, 分级标注模型, 多维影响因子, 卡尔曼算法

CLC Number: