Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (1): 282-286.DOI: 10.11772/j.issn.1001-9081.2016.01.0282

Previous Articles     Next Articles

Data recovery algorithm in chemical process based on locally weighted reconstruction

GUO Jinyu, YUAN Tangming, LI Yuan   

  1. College of Information Engineering, Shenyang University of Chemical Technology, Shenyang Liaoning 110142, China
  • Received:2015-07-01 Revised:2015-09-09 Online:2016-01-10 Published:2016-01-09
  • Supported by:
    This work is partially supported by the Major Program of the National Natural Science Foundation of China (61490701), the National Natural Science Foundation of China (61174119), the Education Department Research Project of Liaoning Province (L2013155) and the Education Department Project of Key Laboratory of Liaoning Province (LZ2015059).

基于局部加权重构的化工过程数据恢复算法

郭金玉, 袁堂明, 李元   

  1. 沈阳化工大学 信息工程学院, 沈阳 110142
  • 通讯作者: 李元(1964-),女,辽宁沈阳人,教授,博士,主要研究方向:故障检测、故障诊断、数据挖掘
  • 作者简介:郭金玉(1975-),女,山东高唐人,副教授,博士,主要研究方向:故障检测、故障诊断、生物特征识别、数据挖掘;袁堂明(1989-),男,辽宁抚顺人,硕士研究生,主要研究方向:故障检测、数据挖掘。
  • 基金资助:
    国家自然科学基金重大项目(61490701);国家自然科学基金资助项目(61174119);辽宁省教育厅项目(L2013155);辽宁省教育厅重点实验室项目(LZ2015059)。

Abstract: According to phenomenon of missing data in the chemical process, a Locally Weighted Recovery Algorithm (LWRA) for dealing with missing data in the chemical process was proposed based on preserving the local data structure characteristic. The missing data points were located and marked with the symbol NaN (Not a Number), the missing data set was divided into complete data set and incomplete data set. The corresponding k nearest neighbors of incomplete data set were found in the complete data according to the size of integrity in turn, and the corresponding weights of k nearest neighbors were calculated according to the principle of minimum error sum of squares. Finally, the missing data points were reconstructed by k nearest neighbors and their corresponding weights. The algorithm was applied into two types of chemical process data with different missing rates and compared with two traditional data recovery algorithms, Expectation Maximization Principal Component Analysis (EM-PCA) and Mean Algorithm (MA). The results reveal that the proposed method has the lowest error, and the computation speed increases by 2 times in average than EM-PCA. The experimental results demonstrate that the proposed algorithm can not only recover data efficiently but also improve the utilization rate of the data, and it's suitable for nonlinear chemical process data recovery.

Key words: data mining, missing data, data recovery, k Nearest Neighbor (kNN) rule, locally weighted reconstruction, chemical process

摘要: 针对化工过程数据中存在缺失数据的问题,在保持局部数据结构特征的基础上提出了基于局部加权重构的化工过程数据恢复算法。通过定位缺失的数据点并以符号NaN(Not a Number)标记,将缺失的数据集分为完备数据集和不完备数据集。不完备的数据集按照完整性的大小依次找到它们在完备数据集中相应的k个近邻,根据误差平方和最小的原则,求出k个近邻相应的权值,用k个近邻及相应的权值重构出缺失的数据点。将该算法应用在不同缺失率下的两种化工过程数据中并与望最大化主成分分析(EM-PCA)法和平均值(MA)两种传统的数据恢复算法相比较,该算法的恢复数据误差最小,并且计算速度相比EM-PCA算法平均提高了2倍。实验结果表明,局部加权重构的化工过程数据恢复算法可以有效地对数据进行恢复,提高了数据的利用率,适用于非线性化工过程缺失数据的恢复。

关键词: 数据挖掘, 缺失数据, 数据恢复, k近邻规则, 局部加权重构, 化工过程

CLC Number: