《计算机应用》唯一官方网站

• •    下一篇

基于自适应阈值学习的时序因果推断方法

赵秦壮,谭红叶   

  1. 山西大学 计算机与信息技术学院
  • 收稿日期:2023-09-18 修回日期:2024-03-13 发布日期:2024-04-16 出版日期:2024-04-16
  • 通讯作者: 赵秦壮
  • 作者简介:赵秦壮(1998—),男,山西运城人,博士研究生,CCF会员,主要研究方向:因果推断;谭红叶(1971—),女,广西灵山人,教授,博士,CCF会员,主要研究方向:自然语言处理。
  • 基金资助:
    国家自然科学基金资助项目(62076155)

Time series causal inference method based on adaptive threshold learning

ZHAO Qinzhuang, TAN Hongye   

  1. School of Computer and Information Technology, Shanxi University
  • Received:2023-09-18 Revised:2024-03-13 Online:2024-04-16 Published:2024-04-16
  • About author:ZHAO Qinzhuang, born in 1998, Ph.D. candidate. His research interests include causal inference. TAN Hongye, born in 1971, Ph.D., professor. Her research interests include natural language processing.
  • Supported by:
    National Natural Science Foundation of China (62076155)

摘要: 时序数据存在近因性特点,即变量值普遍依赖近期的历史信息,而现有方法没有充分考虑时序数据的这种特性,在通过假设检验推断不同延迟的因果关系时使用统一的阈值,难以有效推断较弱的因果关系。针对上述问题,提出了基于自适应阈值学习的时序因果推断方法:首先提取数据特性,然后根据不同延迟下数据呈现的性质,自动地学习假设检验过程中使用的阈值组合,最后将该阈值组合用于PC(Peter-Clark)算法、PCMCI(Peter-Clark and Momentary Conditional Independence)算法和VAR-LINGAM(Vector Autoregressive Linear non-Gaussian Acyclic Model)算法的假设检验过程,以得到更为准确的因果关系结构。在仿真数据集上进行了实验验证:在数据集a上,采用所提方法的自适应PC算法、自适应PCMCI算法、自适应VAR-LINGAM算法的F1值分别提高了约、1、0.03个百分点;在数据集b上分别提高了约0.53、1.16、2.36个百分点;在数据集c上分别提高了约0.22、3.56、0.98个百分点。

关键词: 因果推断, 时间序列, 假设检验, 参数优化, 自适应

Abstract: The recency characteristic was exhibited by time-series data, where variable values were generally dependent on recent historical information. This characteristic was not fully considered by existing methods, which used a uniform threshold when causal relationships with different delays were inferred through hypothesis testing, making it difficult to effectively infer weaker causal relationships. To address the aforementioned issue, a method for time-series causal inference based on adaptive threshold learning was proposed: first, data characteristics were extracted, then, based on the nature of the data at different delays, a combination of thresholds used in the hypothesis testing process was automatically learned. Finally, this threshold combination was applied to the hypothesis testing processes of the PC (Peter-Clark) algorithm, PCMCI (Peter-Clark and Momentary Conditional Independence) algorithm, and VAR-LINGAM (Vector Autoregressive Linear non-Gaussian Acyclic Model) algorithm to obtain a more accurate causal relationship structure. Experimental verification was conducted on a simulated dataset: on dataset a, the adaptive PC algorithm, adaptive PCMCI algorithm, and adaptive VAR-LINGAM algorithm using the proposed method improved the F1 score by approximately 1.31, 1, and 0.03 percentage points respectively; on dataset b, they improved by approximately 0.53, 1.16, and 2.36 percentage points respectively; on dataset c, they improved by approximately 0.22, 3.56, and 0.98 percentage points respectively.

Key words: causal inference, time series, hypothesis test, parameter optimization, adaptive

中图分类号: