Journal of Computer Applications ›› 2024, Vol. 44 ›› Issue (6): 1743-1750.DOI: 10.11772/j.issn.1001-9081.2023060824
Special Issue: CCF第38届中国计算机应用大会 (CCF NCCA 2023)
• The 38th CCF National Conference of Computer Applications (CCF NCCA 2023) • Previous Articles Next Articles
Hongtao SONG, Jiangsheng YU, Qilong HAN()
Received:
2023-07-04
Revised:
2023-08-09
Accepted:
2023-08-09
Online:
2023-08-28
Published:
2024-06-10
Contact:
Qilong HAN
About author:
SONG Hongtao, born in 1980, Ph. D., associate professor. His research interests include industrial data quality analysis and intelligent processing.Supported by:
通讯作者:
韩启龙
作者简介:
宋洪涛(1980—),男,河北昌黎人,副教授,博士,CCF会员,主要研究方向:工业数据质量分析与智能处理基金资助:
CLC Number:
Hongtao SONG, Jiangsheng YU, Qilong HAN. Industrial multivariate time series data quality assessment method[J]. Journal of Computer Applications, 2024, 44(6): 1743-1750.
宋洪涛, 于江生, 韩启龙. 工业多元时序数据质量评估方法[J]. 《计算机应用》唯一官方网站, 2024, 44(6): 1743-1750.
Add to citation manager EndNote|Ris|BibTeX
URL: https://www.joca.cn/EN/10.11772/j.issn.1001-9081.2023060824
记录 | 属性 | |||
---|---|---|---|---|
… | ||||
y11 | y12 | … | y1n | |
y21 | y22 | … | y2n | |
︙ | ︙ | ︙ | ︙ | |
ym1 | ym2 | … | ymn |
Tab. 1 Multivariate time series data structure
记录 | 属性 | |||
---|---|---|---|---|
… | ||||
y11 | y12 | … | y1n | |
y21 | y22 | … | y2n | |
︙ | ︙ | ︙ | ︙ | |
ym1 | ym2 | … | ymn |
维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 |
---|---|---|---|---|
完整性 | 1 | 属性完整性 | 1 | |
记录完整性 | 1 | 0.6 | ||
数值完整性 | 1 | 0.4 | ||
规范性 | 1 | 类型规范性 | 1 | 0.2 |
精度规范性 | 1 | 0.8 | ||
一致性 | 1 | 顺序一致性 | 1 | 0.7 |
逻辑一致性 | 1 | 0.3 | ||
及时性 | 1 | 及时性 | 1 | 1.0 |
唯一性 | 1 | 属性唯一性 | 1 | 0.5 |
记录唯一性 | 1 | 0.5 | ||
准确性 | 1 | 范围准确性 | 1 | 0.5 |
数值准确性 | 1 | 0.5 |
Tab. 2 Assessment results for simulation dataset
维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 |
---|---|---|---|---|
完整性 | 1 | 属性完整性 | 1 | |
记录完整性 | 1 | 0.6 | ||
数值完整性 | 1 | 0.4 | ||
规范性 | 1 | 类型规范性 | 1 | 0.2 |
精度规范性 | 1 | 0.8 | ||
一致性 | 1 | 顺序一致性 | 1 | 0.7 |
逻辑一致性 | 1 | 0.3 | ||
及时性 | 1 | 及时性 | 1 | 1.0 |
唯一性 | 1 | 属性唯一性 | 1 | 0.5 |
记录唯一性 | 1 | 0.5 | ||
准确性 | 1 | 范围准确性 | 1 | 0.5 |
数值准确性 | 1 | 0.5 |
维度 | 子维度 | 描述 |
---|---|---|
完整性 | 属性完整性 | 删除属性A5的所有数据值 |
记录完整性 | 删除2条记录 | |
数值完整性 | 删除5个数据值,其中2个数据值在同一条 记录上 | |
规范性 | 类型规范性 | 修改2个数据值为字符型 |
精度规范性 | 不做任何修改 | |
一致性 | 顺序一致性 | 提前2条记录的位置 |
逻辑一致性 | 不做任何修改 | |
及时性 | 及时性 | 将1条记录的时间戳增加1 s |
唯一性 | 属性唯一性 | 不做任何修改 |
记录唯一性 | 修改3条记录使它与相邻记录重复 | |
准确性 | 范围准确性 | 修改10个数据值使它超出规定范围 |
数值准确性 | 修改3个数据值使它在规定范围内但不准确 |
Tab. 3 Error handling of original dataset
维度 | 子维度 | 描述 |
---|---|---|
完整性 | 属性完整性 | 删除属性A5的所有数据值 |
记录完整性 | 删除2条记录 | |
数值完整性 | 删除5个数据值,其中2个数据值在同一条 记录上 | |
规范性 | 类型规范性 | 修改2个数据值为字符型 |
精度规范性 | 不做任何修改 | |
一致性 | 顺序一致性 | 提前2条记录的位置 |
逻辑一致性 | 不做任何修改 | |
及时性 | 及时性 | 将1条记录的时间戳增加1 s |
唯一性 | 属性唯一性 | 不做任何修改 |
记录唯一性 | 修改3条记录使它与相邻记录重复 | |
准确性 | 范围准确性 | 修改10个数据值使它超出规定范围 |
数值准确性 | 修改3个数据值使它在规定范围内但不准确 |
维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 |
---|---|---|---|---|
完整性 | 0.698 9 | 属性完整性 | 0.800 0 | |
记录完整性 | 0.940 0 | 0.6 | ||
数值完整性 | 0.774 0 | 0.4 | ||
规范性 | 0.993 1 | 类型规范性 | 0.993 1 | 0.2 |
精度规范性 | 0.993 1 | 0.8 | ||
一致性 | 0.970 4 | 顺序一致性 | 0.979 6 | 0.7 |
逻辑一致性 | 0.949 0 | 0.3 | ||
及时性 | 0.898 0 | 及时性 | 0.898 0 | 1.0 |
唯一性 | 0.984 7 | 属性唯一性 | 1.000 0 | 0.5 |
记录唯一性 | 0.969 4 | 0.5 | ||
准确性 | 0.702 8 | 范围准确性 | 0.706 6 | 0.5 |
数值准确性 | 0.699 0 | 0.5 |
Tab. 4 Assessment results for error simulation dataset
维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 |
---|---|---|---|---|
完整性 | 0.698 9 | 属性完整性 | 0.800 0 | |
记录完整性 | 0.940 0 | 0.6 | ||
数值完整性 | 0.774 0 | 0.4 | ||
规范性 | 0.993 1 | 类型规范性 | 0.993 1 | 0.2 |
精度规范性 | 0.993 1 | 0.8 | ||
一致性 | 0.970 4 | 顺序一致性 | 0.979 6 | 0.7 |
逻辑一致性 | 0.949 0 | 0.3 | ||
及时性 | 0.898 0 | 及时性 | 0.898 0 | 1.0 |
唯一性 | 0.984 7 | 属性唯一性 | 1.000 0 | 0.5 |
记录唯一性 | 0.969 4 | 0.5 | ||
准确性 | 0.702 8 | 范围准确性 | 0.706 6 | 0.5 |
数值准确性 | 0.699 0 | 0.5 |
整体DQ | 维度 | 维度得分 | 维度权重 |
---|---|---|---|
0.874 7 | 完整性 | 0.698 9 | 1/6 |
规范性 | 0.993 1 | 1/6 | |
一致性 | 0.970 4 | 1/6 | |
及时性 | 0.898 0 | 1/6 | |
唯一性 | 0.984 7 | 1/6 | |
准确性 | 0.702 8 | 1/6 |
Tab. 5 Overall DQ score for error simulation dataset
整体DQ | 维度 | 维度得分 | 维度权重 |
---|---|---|---|
0.874 7 | 完整性 | 0.698 9 | 1/6 |
规范性 | 0.993 1 | 1/6 | |
一致性 | 0.970 4 | 1/6 | |
及时性 | 0.898 0 | 1/6 | |
唯一性 | 0.984 7 | 1/6 | |
准确性 | 0.702 8 | 1/6 |
维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 |
---|---|---|---|---|
完整性 | 0.480 1 | 属性完整性 | 1.000 0 | |
记录完整性 | 0.480 1 | 0.6 | ||
数值完整性 | 0.480 1 | 0.4 | ||
规范性 | 0.845 2 | 类型规范性 | 1.000 0 | 0.2 |
精度规范性 | 0.806 5 | 0.8 | ||
一致性 | 0.819 5 | 顺序一致性 | 0.819 5 | 1.0 |
逻辑一致性 | — | 0.0 | ||
及时性 | 0.339 5 | 及时性 | 0.339 5 | 1.0 |
唯一性 | 1.000 0 | 属性唯一性 | 1.000 0 | 0.5 |
记录唯一性 | 1.000 0 | 0.5 | ||
准确性 | 0.932 3 | 范围准确性 | 0.934 7 | 0.5 |
数值准确性 | 0.929 8 | 0.5 |
Tab. 6 Assessment results for Intel Lab dataset
维度 | 维度得分 | 子维度 | 子维度得分 | 子维度权重 |
---|---|---|---|---|
完整性 | 0.480 1 | 属性完整性 | 1.000 0 | |
记录完整性 | 0.480 1 | 0.6 | ||
数值完整性 | 0.480 1 | 0.4 | ||
规范性 | 0.845 2 | 类型规范性 | 1.000 0 | 0.2 |
精度规范性 | 0.806 5 | 0.8 | ||
一致性 | 0.819 5 | 顺序一致性 | 0.819 5 | 1.0 |
逻辑一致性 | — | 0.0 | ||
及时性 | 0.339 5 | 及时性 | 0.339 5 | 1.0 |
唯一性 | 1.000 0 | 属性唯一性 | 1.000 0 | 0.5 |
记录唯一性 | 1.000 0 | 0.5 | ||
准确性 | 0.932 3 | 范围准确性 | 0.934 7 | 0.5 |
数值准确性 | 0.929 8 | 0.5 |
整体DQ | 维度 | 维度得分 | 维度权重 |
---|---|---|---|
0.736 1 | 完整性 | 0.480 1 | 1/6 |
规范性 | 0.845 2 | 1/6 | |
一致性 | 0.819 5 | 1/6 | |
及时性 | 0.339 5 | 1/6 | |
唯一性 | 1.000 0 | 1/6 | |
准确性 | 0.932 3 | 1/6 |
Tab. 7 Overall DQ score for Intel Lab dataset
整体DQ | 维度 | 维度得分 | 维度权重 |
---|---|---|---|
0.736 1 | 完整性 | 0.480 1 | 1/6 |
规范性 | 0.845 2 | 1/6 | |
一致性 | 0.819 5 | 1/6 | |
及时性 | 0.339 5 | 1/6 | |
唯一性 | 1.000 0 | 1/6 | |
准确性 | 0.932 3 | 1/6 |
维度 | 子维度 | 不同方法的子维度评估实现情况 | |||||
---|---|---|---|---|---|---|---|
文献[ | 文献[ | 文献[ | 文献[ | 文献[ | 本文方法 | ||
完整性 | 属性完整性 | × | × | × | √ | × | √ |
记录完整性 | × | × | √ | √ | × | √ | |
数值完整性 | × | √ | √ | √ | × | √ | |
规范性 | 类型规范性 | × | × | × | × | × | √ |
精度规范性 | × | × | × | × | × | √ | |
一致性 | 顺序一致性 | × | × | × | × | × | √ |
逻辑一致性 | × | × | × | × | × | √ | |
及时性 | 及时性 | × | × | × | √ | × | √ |
唯一性 | 属性唯一性 | × | × | × | × | × | √ |
记录唯一性 | × | × | × | √ | × | √ | |
准确性 | 范围准确性 | × | × | × | √ | × | √ |
数值准确性 | × | × | √ | × | × | √ | |
是否考虑维度间相关性 | × | × | × | × | × | √ |
Tab. 8 Comparison of existing DQA methods and proposed method
维度 | 子维度 | 不同方法的子维度评估实现情况 | |||||
---|---|---|---|---|---|---|---|
文献[ | 文献[ | 文献[ | 文献[ | 文献[ | 本文方法 | ||
完整性 | 属性完整性 | × | × | × | √ | × | √ |
记录完整性 | × | × | √ | √ | × | √ | |
数值完整性 | × | √ | √ | √ | × | √ | |
规范性 | 类型规范性 | × | × | × | × | × | √ |
精度规范性 | × | × | × | × | × | √ | |
一致性 | 顺序一致性 | × | × | × | × | × | √ |
逻辑一致性 | × | × | × | × | × | √ | |
及时性 | 及时性 | × | × | × | √ | × | √ |
唯一性 | 属性唯一性 | × | × | × | × | × | √ |
记录唯一性 | × | × | × | √ | × | √ | |
准确性 | 范围准确性 | × | × | × | √ | × | √ |
数值准确性 | × | × | √ | × | × | √ | |
是否考虑维度间相关性 | × | × | × | × | × | √ |
1 | 国务院. 国务院关于深化“互联网+先进制造业”发展工业互联网的指导意见 [EB/OL]. (2017-11-27) [2023-04-20]. “Internet+ Advanced Manufacturing” and developing industrial Internet [EB/OL]. (2017-11-27) [2023-04-20]. ) |
2 | REDMAN T C. Bad data costs the US $3 trillion per year [J/OL]. Harvard Business Review(2016-09-22)[2023-05-30]. . |
3 | GUALO F, RODRÍGUEZ M, VERDUGO J, et al. Data quality certification using ISO/IEC 25012: industrial experiences [J]. Journal of Systems and Software, 2021, 176: 110938. |
4 | DE AQUINO G R C, DE FARIAS C M, PIRMEZ L. Data quality assessment and enhancement on social and sensor data [C/OL] // Proceedings of the Poster Track of the Workshop on Big Social Data and Urban Computing Co-located with BiDU 2018 at VLDB 2018. (2018)[2023-05-30]. . |
5 | ALWAN A A, CIUPALA M A, BRIMICOMBE A J, et al. Data quality challenges in large-scale cyber-physical systems: a systematic review [J]. Information Systems, 2022, 105: 101951. |
6 | ZHANG L, JEONG D, LEE S. Data quality management in the internet of things [J]. Sensors, 2021, 21(17): 5834. |
7 | WANG R Y, STRONG D M. Beyond accuracy: what data quality means to data consumers [J]. Journal of Management Information Systems, 1996, 12(4): 5-33. |
8 | ORR K. Data quality and systems theory [J]. Communications of the ACM, 1998, 41(2): 66-71. |
9 | KARKOUCH A, MOUSANNIF H, MOATASSIME H AL, et al. Data quality in internet of things: a state-of-the-art survey [J]. Journal of Network and Computer Applications, 2016, 73: 57-81. |
10 | GABR M I, HELMY Y M, ELZANFALY D S. Data quality dimensions, metrics, and improvement techniques [J]. Future Computing and Informatics Journal, 2021, 6(1): 25-44. |
11 | LIU C, NITSCHKE P, WILLIAMS S P, et al. Data quality and the internet of things [J]. Computing, 2020, 102(2): 573-599. |
12 | WU H, LIN A, CLARKE K C, et al. A comprehensive quality assessment framework for linear features from volunteered geographic information [J]. International Journal of Geographical Information Science, 2021, 35(9): 1826-1847. |
13 | AL-MASRI E A M, BAI Y. A service-oriented approach for assessing the quality of data for the internet of things [C]// Proceedings of the 2019 IEEE International Conference on Service-Oriented System Engineering. Piscataway: IEEE, 2019: 9-97. |
14 | DE AQUINO G R C, DE FARIAS C M, PIRMEZ L. Hygieia: data quality assessment for smart sensor network [C]// Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing. New York: ACM, 2019: 889-891. |
15 | ENGEMANN K. Measuring data quality for ongoing improvement: a data quality assessment framework [J]. Benchmarking: An International Journal, 2014, 21(3): 481-482. |
16 | KIRCHEN I, SCHÜTZ D, FOLMER J, et al. Metrics for the evaluation of data quality of signal data in industrial processes [C]// Proceedings of the 2017 IEEE 15th International Conference on Industrial Informatics. Piscataway: IEEE, 2017: 819-826. |
17 | FIZZA K, JAYARAMAN P P, BANERJEE A, et al. Evaluating sensor data quality in internet of things smart agriculture applications [J]. IEEE Micro, 2022, 42(1): 51-60. |
18 | G-O MERITXELL, SIERRA B, FERREIRO S. On the evaluation, management and improvement of data quality in streaming time series [J]. IEEE Access, 2022, 10: 81458-81475. |
19 | LI W, XU S, PENG X. Research on comprehensive evaluation of data source quality in big data environment [J]. International Journal of Computational Intelligence Systems, 2021, 14(1): 1831-1841. |
20 | KWAK S G, KIM J H. Central limit theorem: the cornerstone of modern statistics [J]. Korean Journal of Anesthesiology, 2017, 70(2): 144-156. |
21 | NEVIL S. How to calculate z-score and its meaning [EB/OL]. (2017-03-31) [2023-04-20]. . |
22 | VAIDYA O S, KUMAR S. Analytic hierarchy process: an overview of applications[J]. European Journal of Operational Research, 2006, 169(1): 1-29. |
23 | JIANG Y, FANG M, LIU Z, et al. Comprehensive evaluation of power quality based on an improved TOPSIS method considering the correlation between indices [J]. Applied Sciences, 2019, 9(17): 3603. |
24 | LIU Y, XU Q, LIU Y, et al. Comprehensive evaluation of power quality based on improved TOPSIS method and combination weights [C]// Proceedings of the 2022 IEEE 5th International Electrical and Energy Conference. Piscataway: IEEE, 2022: 2609-2614. |
25 | SAMUEL M. Intel lab data [DB/OL]. (2004-06-06) [2023-04-20]. . |
[1] | Yun LI, Fuyou WANG, Peiguang JING, Su WANG, Ao XIAO. Uncertainty-based frame associated short video event detection method [J]. Journal of Computer Applications, 2024, 44(9): 2903-2910. |
[2] | Liting LI, Bei HUA, Ruozhou HE, Kuang XU. Multivariate time series prediction model based on decoupled attention mechanism [J]. Journal of Computer Applications, 2024, 44(9): 2732-2738. |
[3] | Wentao JIANG, Wanxuan LI, Shengchong ZHANG. Correlation filtering based target tracking with nonlinear temporal consistency [J]. Journal of Computer Applications, 2024, 44(8): 2558-2570. |
[4] | Shunwang FU, Qian CHEN, Zhi LI, Guomei WANG, Yu LU. Two-channel progressive feature filtering network for tampered image detection and localization [J]. Journal of Computer Applications, 2024, 44(4): 1303-1309. |
[5] | Shuai REN, Yuanfa JI, Xiyan SUN, Zhaochuan WEI, Zian LIN. Prediction of landslide displacement based on improved grey wolf optimizer and support vector regression [J]. Journal of Computer Applications, 2024, 44(3): 972-982. |
[6] | Shengjie MENG, Wanjun YU, Ying CHEN. Feature selection algorithm for high-dimensional data with maximum correlation and maximum difference [J]. Journal of Computer Applications, 2024, 44(3): 767-771. |
[7] | Jingxin LIU, Wenjing HUANG, Liangsheng XU, Chong HUANG, Jiansheng WU. Unsupervised feature selection model with dictionary learning and sample correlation preservation [J]. Journal of Computer Applications, 2024, 44(12): 3766-3775. |
[8] | Bo LI, Jianqiang HUANG, Dongqiang HUANG, Xiaoying WANG. Adaptive computing optimization of sparse matrix-vector multiplication based on heterogeneous platforms [J]. Journal of Computer Applications, 2024, 44(12): 3867-3875. |
[9] | Chenghao YANG, Jie HU, Hongjun WANG, Bo PENG. Incomplete multi-view clustering algorithm based on attention mechanism [J]. Journal of Computer Applications, 2024, 44(12): 3784-3789. |
[10] | Jia CHEN, Hong ZHANG. Image text retrieval method based on feature enhancement and semantic correlation matching [J]. Journal of Computer Applications, 2024, 44(1): 16-23. |
[11] | Yuan WEI, Yan LIN, Shengnan GUO, Youfang LIN, Huaiyu WAN. Prediction of taxi demands between urban regions by fusing origin-destination spatial-temporal correlation [J]. Journal of Computer Applications, 2023, 43(7): 2100-2106. |
[12] | Mengting WANG, Wenzhong YANG, Yongzhi WU. Survey of single target tracking algorithms based on Siamese network [J]. Journal of Computer Applications, 2023, 43(3): 661-673. |
[13] | Shaosheng DAI, Kun XIONG, Yunduo WU, Jiawei XIAO. Video facial landmark tracking by multi-view constrained cascade regression [J]. Journal of Computer Applications, 2022, 42(8): 2415-2422. |
[14] | Jing ZHAO, Jingyu HAN, Long QIAN, Yi MAO. ECG diagnostic classification based on improved RAKEL algorithm [J]. Journal of Computer Applications, 2022, 42(6): 1892-1897. |
[15] | Yan LI, Jie GUO, Bin FAN. Feature construction and preliminary analysis of uncertainty for meta-learning [J]. Journal of Computer Applications, 2022, 42(2): 343-348. |
Viewed | ||||||
Full text |
|
|||||
Abstract |
|
|||||