Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1743-1750.DOI: 10.11772/j.issn.1001-9081.2023060824
Special Issue: CCF第38届中国计算机应用大会 (CCF NCCA 2023)
• The 38th CCF National Conference of Computer Applications (CCF NCCA 2023) • Previous Articles Next Articles
					
						                                                                                                                                                                                                                    Hongtao SONG, Jiangsheng YU, Qilong HAN( )
)
												  
						
						
						
					
				
Received:2023-07-04
															
							
																	Revised:2023-08-09
															
							
																	Accepted:2023-08-09
															
							
							
																	Online:2023-08-28
															
							
																	Published:2024-06-10
															
							
						Contact:
								Qilong HAN   
													About author:SONG Hongtao, born in 1980, Ph. D., associate professor. His research interests include industrial data quality analysis and intelligent processing.Supported by:通讯作者:
					韩启龙
							作者简介:宋洪涛(1980—),男,河北昌黎人,副教授,博士,CCF会员,主要研究方向:工业数据质量分析与智能处理基金资助:CLC Number:
Hongtao SONG, Jiangsheng YU, Qilong HAN. Industrial multivariate time series data quality assessment method[J]. Journal of Computer Applications, 2024, 44(6): 1743-1750.
宋洪涛, 于江生, 韩启龙. 工业多元时序数据质量评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1743-1750.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023060824
| 记录 | 属性 | |||
|---|---|---|---|---|
| … | ||||
| y11 | y12 | … | y1n | |
| y21 | y22 | … | y2n | |
| ︙ | ︙ | ︙ | ︙ | |
| ym1 | ym2 | … | ymn | |
Tab. 1 Multivariate time series data structure
| 记录 | 属性 | |||
|---|---|---|---|---|
| … | ||||
| y11 | y12 | … | y1n | |
| y21 | y22 | … | y2n | |
| ︙ | ︙ | ︙ | ︙ | |
| ym1 | ym2 | … | ymn | |
| 维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 | 
|---|---|---|---|---|
| 完整性 | 1 | 属性完整性 | 1 | |
| 记录完整性 | 1 | 0.6 | ||
| 数值完整性 | 1 | 0.4 | ||
| 规范性 | 1 | 类型规范性 | 1 | 0.2 | 
| 精度规范性 | 1 | 0.8 | ||
| 一致性 | 1 | 顺序一致性 | 1 | 0.7 | 
| 逻辑一致性 | 1 | 0.3 | ||
| 及时性 | 1 | 及时性 | 1 | 1.0 | 
| 唯一性 | 1 | 属性唯一性 | 1 | 0.5 | 
| 记录唯一性 | 1 | 0.5 | ||
| 准确性 | 1 | 范围准确性 | 1 | 0.5 | 
| 数值准确性 | 1 | 0.5 | 
Tab. 2 Assessment results for simulation dataset
| 维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 | 
|---|---|---|---|---|
| 完整性 | 1 | 属性完整性 | 1 | |
| 记录完整性 | 1 | 0.6 | ||
| 数值完整性 | 1 | 0.4 | ||
| 规范性 | 1 | 类型规范性 | 1 | 0.2 | 
| 精度规范性 | 1 | 0.8 | ||
| 一致性 | 1 | 顺序一致性 | 1 | 0.7 | 
| 逻辑一致性 | 1 | 0.3 | ||
| 及时性 | 1 | 及时性 | 1 | 1.0 | 
| 唯一性 | 1 | 属性唯一性 | 1 | 0.5 | 
| 记录唯一性 | 1 | 0.5 | ||
| 准确性 | 1 | 范围准确性 | 1 | 0.5 | 
| 数值准确性 | 1 | 0.5 | 
| 维度 | 子维度 | 描述 | 
|---|---|---|
| 完整性 | 属性完整性 | 删除属性A5的所有数据值 | 
| 记录完整性 | 删除2条记录 | |
| 数值完整性 | 删除5个数据值,其中2个数据值在同一条 记录上 | |
| 规范性 | 类型规范性 | 修改2个数据值为字符型 | 
| 精度规范性 | 不做任何修改 | |
| 一致性 | 顺序一致性 | 提前2条记录的位置 | 
| 逻辑一致性 | 不做任何修改 | |
| 及时性 | 及时性 | 将1条记录的时间戳增加1 s | 
| 唯一性 | 属性唯一性 | 不做任何修改 | 
| 记录唯一性 | 修改3条记录使它与相邻记录重复 | |
| 准确性 | 范围准确性 | 修改10个数据值使它超出规定范围 | 
| 数值准确性 | 修改3个数据值使它在规定范围内但不准确 | 
Tab. 3 Error handling of original dataset
| 维度 | 子维度 | 描述 | 
|---|---|---|
| 完整性 | 属性完整性 | 删除属性A5的所有数据值 | 
| 记录完整性 | 删除2条记录 | |
| 数值完整性 | 删除5个数据值,其中2个数据值在同一条 记录上 | |
| 规范性 | 类型规范性 | 修改2个数据值为字符型 | 
| 精度规范性 | 不做任何修改 | |
| 一致性 | 顺序一致性 | 提前2条记录的位置 | 
| 逻辑一致性 | 不做任何修改 | |
| 及时性 | 及时性 | 将1条记录的时间戳增加1 s | 
| 唯一性 | 属性唯一性 | 不做任何修改 | 
| 记录唯一性 | 修改3条记录使它与相邻记录重复 | |
| 准确性 | 范围准确性 | 修改10个数据值使它超出规定范围 | 
| 数值准确性 | 修改3个数据值使它在规定范围内但不准确 | 
| 维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 | 
|---|---|---|---|---|
| 完整性 | 0.698 9 | 属性完整性 | 0.800 0 | |
| 记录完整性 | 0.940 0 | 0.6 | ||
| 数值完整性 | 0.774 0 | 0.4 | ||
| 规范性 | 0.993 1 | 类型规范性 | 0.993 1 | 0.2 | 
| 精度规范性 | 0.993 1 | 0.8 | ||
| 一致性 | 0.970 4 | 顺序一致性 | 0.979 6 | 0.7 | 
| 逻辑一致性 | 0.949 0 | 0.3 | ||
| 及时性 | 0.898 0 | 及时性 | 0.898 0 | 1.0 | 
| 唯一性 | 0.984 7 | 属性唯一性 | 1.000 0 | 0.5 | 
| 记录唯一性 | 0.969 4 | 0.5 | ||
| 准确性 | 0.702 8 | 范围准确性 | 0.706 6 | 0.5 | 
| 数值准确性 | 0.699 0 | 0.5 | 
Tab. 4 Assessment results for error simulation dataset
| 维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 | 
|---|---|---|---|---|
| 完整性 | 0.698 9 | 属性完整性 | 0.800 0 | |
| 记录完整性 | 0.940 0 | 0.6 | ||
| 数值完整性 | 0.774 0 | 0.4 | ||
| 规范性 | 0.993 1 | 类型规范性 | 0.993 1 | 0.2 | 
| 精度规范性 | 0.993 1 | 0.8 | ||
| 一致性 | 0.970 4 | 顺序一致性 | 0.979 6 | 0.7 | 
| 逻辑一致性 | 0.949 0 | 0.3 | ||
| 及时性 | 0.898 0 | 及时性 | 0.898 0 | 1.0 | 
| 唯一性 | 0.984 7 | 属性唯一性 | 1.000 0 | 0.5 | 
| 记录唯一性 | 0.969 4 | 0.5 | ||
| 准确性 | 0.702 8 | 范围准确性 | 0.706 6 | 0.5 | 
| 数值准确性 | 0.699 0 | 0.5 | 
| 整体DQ | 维度 | 维度得分 | 维度权重 | 
|---|---|---|---|
| 0.874 7 | 完整性 | 0.698 9 | 1/6 | 
| 规范性 | 0.993 1 | 1/6 | |
| 一致性 | 0.970 4 | 1/6 | |
| 及时性 | 0.898 0 | 1/6 | |
| 唯一性 | 0.984 7 | 1/6 | |
| 准确性 | 0.702 8 | 1/6 | 
Tab. 5 Overall DQ score for error simulation dataset
| 整体DQ | 维度 | 维度得分 | 维度权重 | 
|---|---|---|---|
| 0.874 7 | 完整性 | 0.698 9 | 1/6 | 
| 规范性 | 0.993 1 | 1/6 | |
| 一致性 | 0.970 4 | 1/6 | |
| 及时性 | 0.898 0 | 1/6 | |
| 唯一性 | 0.984 7 | 1/6 | |
| 准确性 | 0.702 8 | 1/6 | 
| 维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 | 
|---|---|---|---|---|
| 完整性 | 0.480 1 | 属性完整性 | 1.000 0 | |
| 记录完整性 | 0.480 1 | 0.6 | ||
| 数值完整性 | 0.480 1 | 0.4 | ||
| 规范性 | 0.845 2 | 类型规范性 | 1.000 0 | 0.2 | 
| 精度规范性 | 0.806 5 | 0.8 | ||
| 一致性 | 0.819 5 | 顺序一致性 | 0.819 5 | 1.0 | 
| 逻辑一致性 | — | 0.0 | ||
| 及时性 | 0.339 5 | 及时性 | 0.339 5 | 1.0 | 
| 唯一性 | 1.000 0 | 属性唯一性 | 1.000 0 | 0.5 | 
| 记录唯一性 | 1.000 0 | 0.5 | ||
| 准确性 | 0.932 3 | 范围准确性 | 0.934 7 | 0.5 | 
| 数值准确性 | 0.929 8 | 0.5 | 
Tab. 6 Assessment results for Intel Lab dataset
| 维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 | 
|---|---|---|---|---|
| 完整性 | 0.480 1 | 属性完整性 | 1.000 0 | |
| 记录完整性 | 0.480 1 | 0.6 | ||
| 数值完整性 | 0.480 1 | 0.4 | ||
| 规范性 | 0.845 2 | 类型规范性 | 1.000 0 | 0.2 | 
| 精度规范性 | 0.806 5 | 0.8 | ||
| 一致性 | 0.819 5 | 顺序一致性 | 0.819 5 | 1.0 | 
| 逻辑一致性 | — | 0.0 | ||
| 及时性 | 0.339 5 | 及时性 | 0.339 5 | 1.0 | 
| 唯一性 | 1.000 0 | 属性唯一性 | 1.000 0 | 0.5 | 
| 记录唯一性 | 1.000 0 | 0.5 | ||
| 准确性 | 0.932 3 | 范围准确性 | 0.934 7 | 0.5 | 
| 数值准确性 | 0.929 8 | 0.5 | 
| 整体DQ | 维度 | 维度得分 | 维度权重 | 
|---|---|---|---|
| 0.736 1 | 完整性 | 0.480 1 | 1/6 | 
| 规范性 | 0.845 2 | 1/6 | |
| 一致性 | 0.819 5 | 1/6 | |
| 及时性 | 0.339 5 | 1/6 | |
| 唯一性 | 1.000 0 | 1/6 | |
| 准确性 | 0.932 3 | 1/6 | 
Tab. 7 Overall DQ score for Intel Lab dataset
| 整体DQ | 维度 | 维度得分 | 维度权重 | 
|---|---|---|---|
| 0.736 1 | 完整性 | 0.480 1 | 1/6 | 
| 规范性 | 0.845 2 | 1/6 | |
| 一致性 | 0.819 5 | 1/6 | |
| 及时性 | 0.339 5 | 1/6 | |
| 唯一性 | 1.000 0 | 1/6 | |
| 准确性 | 0.932 3 | 1/6 | 
| 维度 | 子维度 | 不同方法的子维度评估实现情况 | |||||
|---|---|---|---|---|---|---|---|
| 文献[ | 文献[ | 文献[ | 文献[ | 文献[ | 本文方法 | ||
| 完整性 | 属性完整性 | × | × | × | √ | × | √ | 
| 记录完整性 | × | × | √ | √ | × | √ | |
| 数值完整性 | × | √ | √ | √ | × | √ | |
| 规范性 | 类型规范性 | × | × | × | × | × | √ | 
| 精度规范性 | × | × | × | × | × | √ | |
| 一致性 | 顺序一致性 | × | × | × | × | × | √ | 
| 逻辑一致性 | × | × | × | × | × | √ | |
| 及时性 | 及时性 | × | × | × | √ | × | √ | 
| 唯一性 | 属性唯一性 | × | × | × | × | × | √ | 
| 记录唯一性 | × | × | × | √ | × | √ | |
| 准确性 | 范围准确性 | × | × | × | √ | × | √ | 
| 数值准确性 | × | × | √ | × | × | √ | |
| 是否考虑维度间相关性 | × | × | × | × | × | √ | |
Tab. 8 Comparison of existing DQA methods and proposed method
| 维度 | 子维度 | 不同方法的子维度评估实现情况 | |||||
|---|---|---|---|---|---|---|---|
| 文献[ | 文献[ | 文献[ | 文献[ | 文献[ | 本文方法 | ||
| 完整性 | 属性完整性 | × | × | × | √ | × | √ | 
| 记录完整性 | × | × | √ | √ | × | √ | |
| 数值完整性 | × | √ | √ | √ | × | √ | |
| 规范性 | 类型规范性 | × | × | × | × | × | √ | 
| 精度规范性 | × | × | × | × | × | √ | |
| 一致性 | 顺序一致性 | × | × | × | × | × | √ | 
| 逻辑一致性 | × | × | × | × | × | √ | |
| 及时性 | 及时性 | × | × | × | √ | × | √ | 
| 唯一性 | 属性唯一性 | × | × | × | × | × | √ | 
| 记录唯一性 | × | × | × | √ | × | √ | |
| 准确性 | 范围准确性 | × | × | × | √ | × | √ | 
| 数值准确性 | × | × | √ | × | × | √ | |
| 是否考虑维度间相关性 | × | × | × | × | × | √ | |
| 1 | 国务院. 国务院关于深化“互联网+先进制造业”发展工业互联网的指导意见 [EB/OL]. (2017-11-27) [2023-04-20]. “Internet+ Advanced Manufacturing” and developing industrial Internet [EB/OL]. (2017-11-27) [2023-04-20]. ) | 
| 2 | REDMAN T C. Bad data costs the US $3 trillion per year [J/OL]. Harvard Business Review(2016-09-22)[2023-05-30]. . | 
| 3 | GUALO F, RODRÍGUEZ M, VERDUGO J, et al. Data quality certification using ISO/IEC 25012: industrial experiences [J]. Journal of Systems and Software, 2021, 176: 110938. | 
| 4 | DE AQUINO G R C, DE FARIAS C M, PIRMEZ L. Data quality assessment and enhancement on social and sensor data [C/OL] // Proceedings of the Poster Track of the Workshop on Big Social Data and Urban Computing Co-located with BiDU 2018 at VLDB 2018. (2018)[2023-05-30]. . | 
| 5 | ALWAN A A, CIUPALA M A, BRIMICOMBE A J, et al. Data quality challenges in large-scale cyber-physical systems: a systematic review [J]. Information Systems, 2022, 105: 101951. | 
| 6 | ZHANG L, JEONG D, LEE S. Data quality management in the internet of things [J]. Sensors, 2021, 21(17): 5834. | 
| 7 | WANG R Y, STRONG D M. Beyond accuracy: what data quality means to data consumers [J]. Journal of Management Information Systems, 1996, 12(4): 5-33. | 
| 8 | ORR K. Data quality and systems theory [J]. Communications of the ACM, 1998, 41(2): 66-71. | 
| 9 | KARKOUCH A, MOUSANNIF H, MOATASSIME H AL, et al. Data quality in internet of things: a state-of-the-art survey [J]. Journal of Network and Computer Applications, 2016, 73: 57-81. | 
| 10 | GABR M I, HELMY Y M, ELZANFALY D S. Data quality dimensions, metrics, and improvement techniques [J]. Future Computing and Informatics Journal, 2021, 6(1): 25-44. | 
| 11 | LIU C, NITSCHKE P, WILLIAMS S P, et al. Data quality and the internet of things [J]. Computing, 2020, 102(2): 573-599. | 
| 12 | WU H, LIN A, CLARKE K C, et al. A comprehensive quality assessment framework for linear features from volunteered geographic information [J]. International Journal of Geographical Information Science, 2021, 35(9): 1826-1847. | 
| 13 | AL-MASRI E A M, BAI Y. A service-oriented approach for assessing the quality of data for the internet of things [C]// Proceedings of the 2019 IEEE International Conference on Service-Oriented System Engineering. Piscataway: IEEE, 2019: 9-97. | 
| 14 | DE AQUINO G R C, DE FARIAS C M, PIRMEZ L. Hygieia: data quality assessment for smart sensor network [C]// Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. New York: ACM, 2019: 889-891. | 
| 15 | ENGEMANN K. Measuring data quality for ongoing improvement: a data quality assessment framework [J]. Benchmarking: An International Journal, 2014, 21(3): 481-482. | 
| 16 | KIRCHEN I, SCHÜTZ D, FOLMER J, et al. Metrics for the evaluation of data quality of signal data in industrial processes [C]// Proceedings of the 2017 IEEE 15th International Conference on Industrial Informatics. Piscataway: IEEE, 2017: 819-826. | 
| 17 | FIZZA K, JAYARAMAN P P, BANERJEE A, et al. Evaluating sensor data quality in internet of things smart agriculture applications [J]. IEEE Micro, 2022, 42(1): 51-60. | 
| 18 | G-O MERITXELL, SIERRA B, FERREIRO S. On the evaluation, management and improvement of data quality in streaming time series [J]. IEEE Access, 2022, 10: 81458-81475. | 
| 19 | LI W, XU S, PENG X. Research on comprehensive evaluation of data source quality in big data environment [J]. International Journal of Computational Intelligence Systems, 2021, 14(1): 1831-1841. | 
| 20 | KWAK S G, KIM J H. Central limit theorem: the cornerstone of modern statistics [J]. Korean Journal of Anesthesiology, 2017, 70(2): 144-156. | 
| 21 | NEVIL S. How to calculate z-score and its meaning [EB/OL]. (2017-03-31) [2023-04-20]. . | 
| 22 | VAIDYA O S, KUMAR S. Analytic hierarchy process: an overview of applications[J]. European Journal of Operational Research, 2006, 169(1): 1-29. | 
| 23 | JIANG Y, FANG M, LIU Z, et al. Comprehensive evaluation of power quality based on an improved TOPSIS method considering the correlation between indices [J]. Applied Sciences, 2019, 9(17): 3603. | 
| 24 | LIU Y, XU Q, LIU Y, et al. Comprehensive evaluation of power quality based on improved TOPSIS method and combination weights [C]// Proceedings of the 2022 IEEE 5th International Electrical and Energy Conference. Piscataway: IEEE, 2022: 2609-2614. | 
| 25 | SAMUEL M. Intel lab data [DB/OL]. (2004-06-06) [2023-04-20]. . | 
| [1] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. | 
| [2] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. | 
| [3] | Wentao JIANG, Wanxuan LI, Shengchong ZHANG. Correlation filtering based target tracking with nonlinear temporal consistency [J]. Journal of Computer Applications, 2024, 44(8): 2558-2570. | 
| [4] | Shunwang FU, Qian CHEN, Zhi LI, Guomei WANG, Yu LU. Two-channel progressive feature filtering network for tampered image detection and localization [J]. Journal of Computer Applications, 2024, 44(4): 1303-1309. | 
| [5] | Shuai REN, Yuanfa JI, Xiyan SUN, Zhaochuan WEI, Zian LIN. Prediction of landslide displacement based on improved grey wolf optimizer and support vector regression [J]. Journal of Computer Applications, 2024, 44(3): 972-982. | 
| [6] | Shengjie MENG, Wanjun YU, Ying CHEN. Feature selection algorithm for high-dimensional data with maximum correlation and maximum difference [J]. Journal of Computer Applications, 2024, 44(3): 767-771. | 
| [7] | Jingxin LIU, Wenjing HUANG, Liangsheng XU, Chong HUANG, Jiansheng WU. Unsupervised feature selection model with dictionary learning and sample correlation preservation [J]. Journal of Computer Applications, 2024, 44(12): 3766-3775. | 
| [8] | Bo LI, Jianqiang HUANG, Dongqiang HUANG, Xiaoying WANG. Adaptive computing optimization of sparse matrix-vector multiplication based on heterogeneous platforms [J]. Journal of Computer Applications, 2024, 44(12): 3867-3875. | 
| [9] | Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789. | 
| [10] | Jia CHEN, Hong ZHANG. Image text retrieval method based on feature enhancement and semantic correlation matching [J]. Journal of Computer Applications, 2024, 44(1): 16-23. | 
| [11] | Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN. Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation [J]. Journal of Computer Applications, 2023, 43(7): 2100-2106. | 
| [12] | Mengting WANG, Wenzhong YANG, Yongzhi WU. Survey of single target tracking algorithms based on Siamese network [J]. Journal of Computer Applications, 2023, 43(3): 661-673. | 
| [13] | Shaosheng DAI, Kun XIONG, Yunduo WU, Jiawei XIAO. Video facial landmark tracking by multi-view constrained cascade regression [J]. Journal of Computer Applications, 2022, 42(8): 2415-2422. | 
| [14] | Jing ZHAO, Jingyu HAN, Long QIAN, Yi MAO. ECG diagnostic classification based on improved RAKEL algorithm [J]. Journal of Computer Applications, 2022, 42(6): 1892-1897. | 
| [15] | Yan LI, Jie GUO, Bin FAN. Feature construction and preliminary analysis of uncertainty for meta-learning [J]. Journal of Computer Applications, 2022, 42(2): 343-348. | 
| Viewed | ||||||
| Full text |  | |||||
| Abstract |  | |||||