《计算机应用》唯一官方网站

• •    下一篇

基于关系型数据的混合因果模型学习算法

闫琳1,2,钱宇华3,刘赛雄4,李珏1   

  1. 1. 山西大学大数据科学与产业研究院
    2. 演化科学智能山西省重点实验室
    3. 山西大学
    4. 山西大学大数据科学与产业研究院、山西大学演化科学智能山西省重点实验室
  • 收稿日期:2025-12-08 修回日期:2026-01-31 接受日期:2026-02-03 发布日期:2026-02-10 出版日期:2026-02-10
  • 通讯作者: 钱宇华
  • 基金资助:
    国家自然科学基金重点项目

Hybrid Causal Model Learning Algorithm Based on Relational Data

  • Received:2025-12-08 Revised:2026-01-31 Accepted:2026-02-03 Online:2026-02-10 Published:2026-02-10
  • Supported by:
    National Natural Science Foundation of China

摘要: 摘 要: 真实世界中的关系涉及多种实体类型间的交互,关系因果模型(RCM)形象地刻画了这类关系。研究如何从关系因果模型中学习因果关系对复杂场景中的业务决策具有重要意义。现有算法大多依赖于先验知识(Oracle)的关系条件独立性检验来建立和确定因果关系,无法从关系型数据中学习因果;而已有的从关系型数据中学习因果的算法采用基于约束的方式,受到有限数据样本量的限制,导致其算法召回率和F1分数不是很高。基于上述问题,本文提出约束和打分相结合的混合算法(RCSH)。该算法首先通过启发式算法获取无向依赖,构建无向关系因果模型;然后利用关系双变量定向规则(RBO)对该关系因果模型进行定向,在限制搜索空间之后,引入贪婪爬山算法,缓解了已有算法在有限数据量样本下的对长关系路径和多属性依赖的低敏感性问题。合成数据集上的实验结果表明,与鲁棒关系因果发现算法(RRCD)相比,RCSH算法的召回率提升了约12.8%,F1分数提高了约3.31%,且随着数据规模的增大表现出稳步提升的趋势。同时,RCSH算法在真实数据集上也验证了其适用性与有效性。

关键词: 关键词: 关系因果模型, 结构学习, 因果发现, 关系型数据, 混合算法

Abstract: Abstract: Relationships in the real world involve interactions among various entity types, and Relational Causal Model (RCM) provides a clear depiction of such relationships. Learning causal relationships from relational causal model is crucial for supporting business decision-making in complex scenarios. Most existing algorithms rely on the oracle relational conditional independence to discover causal relationships, failing to learn from relational data; algorithms designed to learn causal dependencies from relational data typically adopt constraint-based approaches, but their performance is limited by finite sample sizes, resulting in relatively low recall and F1 score. To address these issues, a hybrid algorithm based on constraint and scoring (RCSH) was proposed. Undirected dependencies were first identified using a heuristic algorithm, and an undirected relational causal model was constructed. The Relational Bivariate Orientation (RBO) rule was then applied to orient the model. After the search space was restricted, a greedy hill-climbing algorithm was employed to improve sensitivity to long relational paths and multi-attribute dependencies under limited sample sizes. In the comparison experiments with Robust Relational Causal Discovery (RRCD), the proposed algorithm achieved improvements of approximately 12.8% in recall and 3.31% in F1-score, showing a steady upward trend as the dataset size increased. Furthermore, the applicability and effectiveness of RCSH were validated on real-world datasets.

Key words: Keywords: relational causal model, structure learning, causal discovery, relational data, hybrid algorithm

中图分类号: