《计算机应用》唯一官方网站 ›› 2024, Vol. 44 ›› Issue (6): 1743-1750.DOI: 10.11772/j.issn.1001-9081.2023060824

• CCF第38届中国计算机应用大会 (CCF NCCA 2023) • 上一篇    

工业多元时序数据质量评估方法

宋洪涛, 于江生, 韩启龙()   

  1. 哈尔滨工程大学 计算机科学与技术学院,哈尔滨 150001
  • 收稿日期:2023-07-04 修回日期:2023-08-09 接受日期:2023-08-09 发布日期:2023-08-28 出版日期:2024-06-10
  • 通讯作者: 韩启龙
  • 作者简介:宋洪涛(1980—),男,河北昌黎人,副教授,博士,CCF会员,主要研究方向:工业数据质量分析与智能处理
    于江生(1996—),男,内蒙古丰镇人,硕士研究生,主要研究方向:工业数据质量评估;
  • 基金资助:
    国家重点研发计划项目(2020YFB1710200)

Industrial multivariate time series data quality assessment method

Hongtao SONG, Jiangsheng YU, Qilong HAN()   

  1. College of Computer Science and Technology,Harbin Engineering University,Harbin Heilongjiang 150001,China
  • Received:2023-07-04 Revised:2023-08-09 Accepted:2023-08-09 Online:2023-08-28 Published:2024-06-10
  • Contact: Qilong HAN
  • About author:SONG Hongtao, born in 1980, Ph. D., associate professor. His research interests include industrial data quality analysis and intelligent processing.
    YU Jiangsheng, born in 1996, M. S. candidate. His research interests include industrial data quality assessment.
  • Supported by:
    National Key Research and Development Program of China(2020YFB1710200)

摘要:

现有的数据质量评估(DQA)方法通常只从特定数据质量维度(DQD)的基本概念分析,忽略了能够反映数据质量(DQ)关键信息的细粒度的子维度对评估结果的影响。针对上述问题,提出一种工业多元时序数据质量评估(IMTSDQA)方法。首先,对于待评估的DQD,如完整性、规范性、一致性、唯一性和准确性等进行细粒度划分,考虑同一DQD内或不同DQD间各子维度的相关性以确定这些子维度的度量;其次,对完整性的属性完整性、记录完整性、数值完整性,规范性的类型规范性、精度规范性,一致性的顺序一致性、逻辑一致性,唯一性的属性唯一性、记录唯一性,准确性的范围准确性、数值准确性等子维度进行权重分配,进而充分挖掘DQD的深层次信息,从而获得反映DQ详情的评估结果。实验结果表明,与现有的基于框架定性分析、依据DQD基本定义构建模型的方法相比,IMTSDQA能更详细、更全面地评估DQ,且不同DQD的评估结果更能客观准确地反映DQ问题。

关键词: 数据质量, 多元时序数据, 数据质量维度, 数据质量评估, 相关性

Abstract:

The existing Data Quality Assessment (DQA) methods often only analyze the basic concept of a specific Data Quality Dimension (DQD), ignoring the influence of fine-grained sub-dimensions that reflect key information of Data Quality (DQ) on the assessment results. To address these problems, an Industrial Multivariate Time Series Data Quality Assessment (IMTSDQA) method was proposed. Firstly, the DQDs to be evaluated such as completeness, normativeness, consistency, uniqueness, and accuracy were fine-grainedly divided, and the correlation of the sub-dimensions within the same DQD or between different DQDs was considered to determine the measurements of these sub-dimensions. Secondly, the sub-dimensions of attribute completeness, record completeness, numerical completeness, type normativeness, precision normativeness, sequential consistency, logical consistency, attribute uniqueness, record uniqueness, range accuracy, and numerical accuracy were weighted to fully mine the deep-level information of DQDs, so as to obtain the evaluation results reflecting the details of DQ. Experimental results show that compared to existing approaches based on qualitative analysis of frameworks and model construction according to basic definitions of DQDs, the proposed method can assess DQ more effectively and comprehensively, and the assessment results of different DQDs can reflect DQ problems more objectively and accurately.

Key words: Data Quality (DQ), multivariate time series data, Data Quality Dimension (DQD), Data Quality Assessment (DQA), correlation

中图分类号: