Journal of Computer Applications ›› 2021, Vol. 41 ›› Issue (10): 2793-2798.DOI: 10.11772/j.issn.1001-9081.2020122066

Special Issue: 人工智能

• Artificial intelligence • Previous Articles     Next Articles

Causal inference method based on confounder hidden compact representation model

CAI Ruichu1, BAI Yiming1, QIAO Jie1, HAO Zhifeng2   

  1. 1. School of Computers, Guangdong University of Technology, Guangzhou Guangdong 510006, China;
    2. School of Mathematics and Big Data, Foshan University, Foshan Guangdong 528000, China
  • Received:2020-12-31 Revised:2021-04-06 Online:2021-10-10 Published:2021-07-14
  • Supported by:
    This work is partially supported by the National Natural Science Foundation of China (61876043, 61976052), the Natural Science Foundation of Guangdong Province(2014A030306004, 2014A030308008), the Science and Technology Program of Guangzhou (201902010058).

基于混淆因子隐压缩表示模型的因果推断方法

蔡瑞初1, 白一鸣1, 乔杰1, 郝志峰2   

  1. 1. 广东工业大学 计算机学院, 广州 510006;
    2. 佛山科学技术学院 数学与大数据学院, 广东 佛山 528000
  • 通讯作者: 蔡瑞初
  • 作者简介:蔡瑞初(1983-),男,浙江温州人,教授,博士,CCF高级会员,主要研究方向:机器学习、数据挖掘;白一鸣(1994-),男,广西北海人,硕士研究生,主要研究方向:机器学习、数据挖掘;乔杰(1993-),男,广东广州人,博士研究生,主要研究方向:机器学习;郝志峰(1968-),男,江苏苏州人,教授,博士,主要研究方向:机器学习、人工智能。
  • 基金资助:
    国家自然科学基金资助项目(61876043,61976052);广东省自然科学基金资助项目(2014A030306004,2014A030308008);广州市科技计划项目(201902010058)。

Abstract: Causal inference methods can be used to discover causal relationships on observation data. When making causal inferences on data having causal structure with confounder, wrong causal relationships may be obtained under the influence of confounders. To solve the problem, a causal inference method based on Confounder Hidden Compact Representation (CHCR) model was proposed. Firstly, the candidate models with intermediate hidden variables that compactly represented the cause variables were constructed based on CHCR model. Secondly, the Bayesian Information Criterion (BIC) was used to calculate the scores of the candidate models and obtain the best model with the highest score. Finally, the real causal relationship between the variables was judged according to the quality of compaction in the best model. Theoretical analysis shows that, the proposed method can identify the causal structures with confounders that cannot be correctly identified by the classical constraint-based methods. In some cases such as the small sample size, BIC scoring can also improve the performance of the proposed method. Experimental results show that, when the number of samples changes, the proposed method has a significant improvement in accuracy compared with the classical methods such as Really Fast Causal Inference algorithm (RFCI), and the proposed method is suitable for situations with different numbers of possible variable values. When mixing different types of causal structures, the accuracy of the proposed method is higher than those of the classical methods such as Max-Min Hill-Climbing algorithm (MMHC). Moreover, the proposed method can obtain the correct causal relationships on Abalone dataset.

Key words: causal relationship, causal discovery, causal inference, confounder, hidden compact representation

摘要: 因果推断方法可以用于在观察数据上发现因果关系。在因果结构含混淆因子的数据上进行因果推断时,可能会受混淆因子的影响而得到错误的因果关系。针对上述问题,提出了一种基于混淆因子隐压缩表示(CHCR)模型的因果推断方法。首先,根据CHCR模型,构造含有对原因变量进行压缩表示的中间隐变量的备选模型;其次,利用贝叶斯信息准则(BIC)计算备选模型评分并选出得分最高的最佳模型;最后,根据最佳模型中的压缩情况判断变量间真正的因果关系。理论分析表明,所提出的方法能够识别经典的基于约束的方法所无法正确分辨的、带有混淆因子的因果结构,且在样本量较小等情况下,BIC评分也可以提高所提方法的表现。实验结果表明,在样本数变化时,所提出的方法在准确率指标上相较于极快因果推断算法(RFCI)等经典方法有显著提升,并适用于各种变量可能取值数不同的情况;在混合不同类型的因果结构时,该方法在准确率指标上高于最大最小爬山算法(MMHC)等经典方法;且该方法能够在Abalone数据集上得到正确的因果关系。

关键词: 因果关系, 因果发现, 因果推断, 混淆因子, 隐压缩表示

CLC Number: