带有隐藏变量的观测数据中因果查询 (Causal query in observational data with hidden variables)

from arxiv, 8 pages and 7 figures. The paper has been accepted by ECAI2020. We have updated the proof of the Theorem 1 and removed Theorem 2 from the conference version

This paper discusses the problem of causal query in observational data with hidden variables, with the aim of seeking the change of an outcome when "manipulating" a variable while given a set of plausible confounding variables which affect the manipulated variable and the outcome. Such an "experiment on data" to estimate the causal effect of the manipulated variable is useful for validating an experiment design using historical data or for exploring confounders when studying a new relationship. However, existing data-driven methods for causal effect estimation face some major challenges, including poor scalability with high dimensional data, low estimation accuracy due to heuristics used by the global causal structure learning algorithms, and the assumption of causal sufficiency when hidden variables are inevitable in data. In this paper, we develop a theorem for using local search to find a superset of the adjustment (or confounding) variables for causal effect estimation from observational data under a realistic pretreatment assumption. The theorem ensures that the unbiased estimate of causal effect is included in the set of causal effects estimated by the superset of adjustment variables. Based on the developed theorem, we propose a data-driven algorithm for causal query. Experiments show that the proposed algorithm is faster and produces better causal effect estimation than an existing data-driven causal effect estimation method with hidden variables. The causal effects estimated by the proposed algorithm are as accurate as those by the state-of-the-art methods using domain knowledge.

翻译：本文讨论观察数据的因果查询问题,其中含有隐藏变量,目的是在“管理”变量时寻求改变结果,同时给出一系列影响被操纵变量和结果的貌似可信的折叠变量,从而影响被操纵变量和结果。在本文中,“数据实验”用于估计被操纵变量的因果效应,有助于用历史数据验证实验设计,或在研究新关系时探索混淆因素。然而,现有的因果估计数据驱动方法面临一些重大挑战,包括:高维度数据的可调度差,由于全球因果结构学习算法使用的超常性导致估算准确性低,以及在数据中无法避免隐藏变量时假设因果充足性。在本文中,我们开发了一种理论,用于使用本地搜索,以根据现实的预处理假设,从观察数据中找出调整(或纠结)因果估计结果的超常值。该理论确保根据调整变量的超常估计的因果影响纳入一套估计。基于已开发的因果影响,我们提议用一种数据驱动的因果估计方法,即以现有因果估计的因果分析方法提出一种更快速的因果估计。