《计算机应用》唯一官方网站

• •    下一篇

违反条件谓词依赖的不一致数据启发式修复方法

  1,管中庆1,戴超凡2*,曹俊彬1   

  1. 1.空军工程大学 航空机务士官学校, 河南 信阳 464000;2.国防科技大学 系统工程学院, 长沙 410073
  • 收稿日期:2025-10-27 修回日期:2026-01-05 接受日期:2026-01-08 发布日期:2026-01-26 出版日期:2026-01-26
  • 通讯作者: 戴超凡
  • 基金资助:
    国家自然科学基金

Heuristic repair method for inconsistent data violating conditional functional dependency with built-in predicates

  • Received:2025-10-27 Revised:2026-01-05 Accepted:2026-01-08 Online:2026-01-26 Published:2026-01-26

摘要: 违反完整性约束的不一致数据是航空机务维修信息系统数据库中常见的数据质量问题,相较于函数依赖(FD)和条件函数依赖(CFD)的“等式约束”形式,包含谓词特殊形式(“大于”“小于”和“等于”等)的条件谓词依赖(CFDps)由于理论上有无穷个满足依赖的候选值,出错时找到初始真值的修复难度更大。针对违反条件谓词依赖的不一致数据更难修复的问题,本文提出一种基于最大可能性的启发式修复方法Heuristic-MPR(Heuristic- Maximum Possibility Repair),首先,根据条件谓词依赖找到数据集上的冲突元组和冲突属性;其次,建立属性错误率概率模型,选择最大错误概率的属性作为候选冲突属性优先修复;再次,修复时借鉴机器学习的思想,考虑数据集中候选冲突属性与其他属性的相关性,计算每种修复方案的修复可能性,选择最大可能性的修复方案作为修复值;最后,验证修复结果对依赖规则的满足程度,判断是否需要重新选择候选冲突属性。真实数据上的实验结果表明,该方法对多种属性类型的不一致数据均具备较好的修复能力,与初始真值完全一致的平均修复准确率为82.22%,与初始真值不完全一致的平均修复偏差率为1.27%。

关键词: 数据质量, 条件谓词依赖, 不一致数据, 属性错误率概率模型, 修复可能性

Abstract: Inconsistent data violating integrity constraints is a common data quality issue in the databases of aviation maintenance information systems. Compared to the “equality constraint” forms of Functional Dependency (FD) and Conditional Functional Dependency (CFD), the Conditional Functional Dependency with built-in Predicates (CFDps) that include special forms of predicates (such as “greater than”, “less than”, and “equal to”) present greater difficulty in repairing errors by finding the initial true value. This is because there are theoretically infinitely many candidate values that can satisfy the dependency. Aiming at the problem that it was more difficult to repair inconsistent data violating CFDps, a Heuristic Repair method based on Maximum Possibility, namely Heuristic-MPR (Heuristic- Maximum Possibility Repair), was proposed. This method was mainly divided into four stages: first, the conflicting tuples and conflicting attributes in the dataset were identified according to the CFDps. Second, a probability model for attribute error rates was established, and the attribute with the highest error probability was selected as the candidate conflicting attribute for priority repair. Third, when conducting repairs, drawing on the concepts of machine learning, the correlations between the candidate conflicting attribute and other attributes in the dataset were considered, the repair probabilities of all repair schemes were calculated, and the scheme with the maximum probability was selected as the repair value. Finally, the extent to which the repaired results satisfy the dependency rules was verified, and a determination was made as to whether it was necessary to reselect candidate conflicting attributes. The experimental results on real data show that the Heuristic-MPR method has good repair capabilities for inconsistent data of various attribute types. The average repair accuracy that is completely consistent with the initial true value is 82.22%, and the average repair deviation rate that is not completely consistent with the initial true value is 1.27%.

Key words: data quality, conditional functional dependency with built-in predicates (CFDps), inconsistent data, attribute error probability model, repair possibility

中图分类号: