Data recovery algorithm in chemical process based on locally weighted reconstruction

doi:10.11772/j.issn.1001-9081.2016.01.0282

Journal of Computer Applications ›› 2016, Vol. 36 ›› Issue (1): 282-286.DOI: 10.11772/j.issn.1001-9081.2016.01.0282

Previous Articles Next Articles

Data recovery algorithm in chemical process based on locally weighted reconstruction

GUO Jinyu, YUAN Tangming, LI Yuan

College of Information Engineering, Shenyang University of Chemical Technology, Shenyang Liaoning 110142, China

Received:2015-07-01 Revised:2015-09-09 Online:2016-01-10 Published:2016-01-09
Supported by:
This work is partially supported by the Major Program of the National Natural Science Foundation of China (61490701), the National Natural Science Foundation of China (61174119), the Education Department Research Project of Liaoning Province (L2013155) and the Education Department Project of Key Laboratory of Liaoning Province (LZ2015059).

基于局部加权重构的化工过程数据恢复算法

郭金玉, 袁堂明, 李元

沈阳化工大学信息工程学院, 沈阳 110142

通讯作者: 李元(1964-),女,辽宁沈阳人,教授,博士,主要研究方向:故障检测、故障诊断、数据挖掘
作者简介:郭金玉(1975-),女,山东高唐人,副教授,博士,主要研究方向:故障检测、故障诊断、生物特征识别、数据挖掘;袁堂明(1989-),男,辽宁抚顺人,硕士研究生,主要研究方向:故障检测、数据挖掘。
基金资助:
国家自然科学基金重大项目(61490701);国家自然科学基金资助项目(61174119);辽宁省教育厅项目(L2013155);辽宁省教育厅重点实验室项目(LZ2015059)。

Abstract

Abstract: According to phenomenon of missing data in the chemical process, a Locally Weighted Recovery Algorithm (LWRA) for dealing with missing data in the chemical process was proposed based on preserving the local data structure characteristic. The missing data points were located and marked with the symbol NaN (Not a Number), the missing data set was divided into complete data set and incomplete data set. The corresponding k nearest neighbors of incomplete data set were found in the complete data according to the size of integrity in turn, and the corresponding weights of k nearest neighbors were calculated according to the principle of minimum error sum of squares. Finally, the missing data points were reconstructed by k nearest neighbors and their corresponding weights. The algorithm was applied into two types of chemical process data with different missing rates and compared with two traditional data recovery algorithms, Expectation Maximization Principal Component Analysis (EM-PCA) and Mean Algorithm (MA). The results reveal that the proposed method has the lowest error, and the computation speed increases by 2 times in average than EM-PCA. The experimental results demonstrate that the proposed algorithm can not only recover data efficiently but also improve the utilization rate of the data, and it's suitable for nonlinear chemical process data recovery.

Key words: data mining, missing data, data recovery, k Nearest Neighbor (kNN) rule, locally weighted reconstruction, chemical process

摘要： 针对化工过程数据中存在缺失数据的问题,在保持局部数据结构特征的基础上提出了基于局部加权重构的化工过程数据恢复算法。通过定位缺失的数据点并以符号NaN(Not a Number)标记,将缺失的数据集分为完备数据集和不完备数据集。不完备的数据集按照完整性的大小依次找到它们在完备数据集中相应的k个近邻,根据误差平方和最小的原则,求出k个近邻相应的权值,用k个近邻及相应的权值重构出缺失的数据点。将该算法应用在不同缺失率下的两种化工过程数据中并与望最大化主成分分析(EM-PCA)法和平均值(MA)两种传统的数据恢复算法相比较,该算法的恢复数据误差最小,并且计算速度相比EM-PCA算法平均提高了2倍。实验结果表明,局部加权重构的化工过程数据恢复算法可以有效地对数据进行恢复,提高了数据的利用率,适用于非线性化工过程缺失数据的恢复。

关键词: 数据挖掘, 缺失数据, 数据恢复, k近邻规则, 局部加权重构, 化工过程

CLC Number:

TP274

GUO Jinyu, YUAN Tangming, LI Yuan. Data recovery algorithm in chemical process based on locally weighted reconstruction[J]. Journal of Computer Applications, 2016, 36(1): 282-286.

郭金玉, 袁堂明, 李元. 基于局部加权重构的化工过程数据恢复算法[J]. 计算机应用, 2016, 36(1): 282-286.

References

[1] MUKHOPADHYAY A, MAULIK U, BANDYOPADHYAY S, et al. Survey of multi-objective evolutionary algorithms for data mining: part II [J]. IEEE transactions on evolutionary computation, 2014, 18(1): 20-35.
[2] 靳剑英.不等长间歇过程的统计建模及在线监测[D].沈阳:东北大学,2012:4-7.(JIN J Y. Statistical modeling and online monitoring for uneven-length batch processes [D]. Shenyang: Northeastern University, 2012: 4-7.)
[3] 郭金玉,赵璐璐,李元.基于统计特征的不等长间歇过程故障诊断研究[J].计算机应用研究,2014,31(1):128-130.(GUO J Y, ZHAO L L, LI Y. Fault diagnosis for uneven-length batch processes based on statistic features [J]. Application research of computers, 2014, 31(1): 128-130.)
[4] 邱保志,甄倩倩,唐耀华.无线传感器网络中缺失数据估计算法[J].计算机应用,2013,33(12):3457-3459.(QIU B Z, ZHEN Q Q, TANG Y H. Estimation algorithm for missing data in wireless sensor network [J]. Journal of computer applications, 2013, 33(12): 3457-3459.)
[5] MUTEKI K, MACGREGOR J, UEDA T. Estimation of missing data using latent variable methods with auxiliary information [J]. Chemometrics and intelligent laboratory systems, 2005, 78(1): 41-50.
[6] 荆文君,张晓琴,常王华.一种基于成分数据的修正EM算法[J].中北大学学报(自然科学版),2013,34(5):485-487.(JING W J, ZHANG X Q, CHANG W H. A modified EM algorithms based on compositional data [J]. Journal of north university of China (natural science edition), 2013, 34(5): 485-487.)
[7] 孙怀宇,刘芳,李元.EM-PCA在化工过程随机缺失数据补值中的应用研究[J].计算机与应用化学,2013,30(7):735-738.(SUN H F, LIU F, LI Y. Imputation of random missing data in chemical engineering process with EM-PCA [J]. Computers and applied chemistry, 2013, 30(7):735-738.)
[8] FAN J, GIJBELS I. Variable bandwidth and local linear regression smoothers [J]. The annals of statistics, 1992, 20(4): 2008-2036.
[9] COVER T, HART P. Nearest neighbor pattern classification [J]. IEEE transactions on information theory, 1967, 13(1): 21-27.
[10] XIANG S, NIE F, PAN C, et al. Regression reformulations of LLE and LTSA with locally linear transformation [J]. IEEE transactions on systems, man, and cybernetics, part B: cybernetics, 2011, 41(5): 1250-1262.
[11] YIN S, DING S, HAGHANI A, et al. A comparison study of basic data-driven fault diagnosis and process monitoring methods on the benchmark Tennessee Eastman process [J]. Journal of process control, 2012, 22(9): 1567-1581.
[12] RICKER N. Decentralized control of the Tennessee Eastman challenge process [J]. Journal of process control, 1996, 6(4): 205-221.
[13] 韩芳,王新香,王爱民.青霉素发酵过程的模型辨识及仿真研究[J].计算机仿真,2013,30(4):361-364.(HAN F, WANG X X, WANG A M. Research on simulation and modeling on Penicillin Fermentation process [J]. Computer simulation, 2013, 30(4): 361-364.)
[14] 孔晓光,郭金玉,林爱军.基于二维主元分析的间歇过程故障诊断[J].计算机应用,2013,33(2):350-352.(KONG X G, GUO J Y, LIN A J. Fault diagnosis for batch processes based on two-dimensional principal component analysis [J]. Journal of computer applications, 2013, 33(2): 350-352.)
[15] STTUBBS S, ZHANG J, MORRIS J. Multiway interval partial least squares for batch process performance monitoring [J]. Industrial & engineering chemistry research, 2013, 52(35): 12399-12407.
[16] WANG X, HU Z, FENG J, et al. Mean-shift tracking algorithm based on Kalman filter using adaptive window and sub-blocking [C]// WCICA 2014: Proceedings of the 2014 11th World Congress on Intelligent Control and Automation. Piscataway, NJ: IEEE, 2014: 5438-5443.

Data recovery algorithm in chemical process based on locally weighted reconstruction

基于局部加权重构的化工过程数据恢复算法

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 15

Recommended Articles

Metrics

[1]	LIU Shize, QIN Yanjun, WANG Chenxing, SU Lin, KE Qixue, LUO Haiyong, SUN Yi, WANG Baohui. Traffic flow prediction algorithm based on deep residual long short-term memory network [J]. Journal of Computer Applications, 2021, 41(6): 1566-1572.
[2]	LI Xujuan, PI Jianyong, HUANG Feixiang, JIA Haipeng. Self-generated deep neural network based 4D trajectory prediction [J]. Journal of Computer Applications, 2021, 41(5): 1492-1499.
[3]	PENG Li, ZHANG Haiqing, LI Daiwei, TANG Dan, YU Xi, HE Lei. Imputation algorithm for hybrid information system of incomplete data analysis approach based on rough set theory [J]. Journal of Computer Applications, 2021, 41(3): 677-685.
[4]	CHEN Kai, YU Yanwei, ZHAO Jindong, SONG Peng. Work location inference method with big data of urban traffic surveillance [J]. Journal of Computer Applications, 2021, 41(1): 177-184.
[5]	LONG Yangyang, CHEN Yuling, XIN Yang, DOU Hui. Secure energy transaction scheme based on alliance blockchain [J]. Journal of Computer Applications, 2020, 40(6): 1668-1673.
[6]	XU Zhoubo, YANG Jian, LIU Huadong, HUANG Wenwen. Protein complex identification algorithm based on XGboost and topological structural information [J]. Journal of Computer Applications, 2020, 40(5): 1510-1514.
[7]	DU Xusheng, YU Jiong, YE Lele, CHEN Jiaying. Outlier detection algorithm based on graph random walk [J]. Journal of Computer Applications, 2020, 40(5): 1322-1328.
[8]	CHEN Xi, MEI Guang, ZHANG Jinjin, XU Weisheng. Student grade prediction method based on knowledge graph and collaborative filtering [J]. Journal of Computer Applications, 2020, 40(2): 595-601.
[9]	MA Dong, CHEN Hongmei, WANG Lizhen, XIAO Qing. Dominant feature mining of spatial sub-prevalent co-location patterns [J]. Journal of Computer Applications, 2020, 40(2): 465-472.
[10]	LI Shasha, LIANG Dongyang, YU Jie, JI Bin, MA Jun, TAN Yusong, WU Qingbo. Research team mining algorithm based on teacher-student relationship [J]. Journal of Computer Applications, 2020, 40(11): 3198-3202.
[11]	SUN Heli, ZHANG Youyou, YANG Zhou, HE Liang, JIA Xiaolin. Urban reachable region search based on time segment tree [J]. Journal of Computer Applications, 2020, 40(10): 2936-2941.
[12]	ZHANG Hang, LIU Shanzheng, TANG Dan, CAI Hongliang. Erasure code with low recovery-overhead in distributed storage systems [J]. Journal of Computer Applications, 2020, 40(10): 2942-2950.
[13]	LI Bo, ZHANG Xiao, YAN Jingyi, LI Kewei, LI Heng, LING Yulong, ZHANG Yong. Application of KNN algorithm based on value difference metric and clustering optimization in bank customer behavior prediction [J]. Journal of Computer Applications, 2019, 39(9): 2784-2788.
[14]	JI Lina, CHEN Kai, YU Yanwei, SONG Peng, WANG Shuying, WANG Chenrui. Vehicle type mining and application analysis based on urban traffic big data [J]. Journal of Computer Applications, 2019, 39(5): 1343-1350.
[15]	YE Zhiyu, FENG Aimin, GAO Hang. Customer purchasing power prediction of Google store based on deep LightGBM ensemble learning model [J]. Journal of Computer Applications, 2019, 39(12): 3434-3439.