The estimation of individual treatment effects (ITE) focuses on predicting the outcome changes that result from a change in treatment. A fundamental challenge in observational data is that while we need to infer outcome differences under alternative treatments, we can only observe each individual's outcome under a single treatment. Existing approaches address this limitation either by training with inferred pseudo-outcomes or by creating matched instance pairs. However, recent work has largely overlooked the potential impact of post-treatment variables on the outcome. This oversight prevents existing methods from fully capturing outcome variability, resulting in increased variance in counterfactual predictions. This paper introduces Pseudo-outcome Imputation with Post-treatment Variables for Counterfactual Regression (PIPCFR), a novel approach that incorporates post-treatment variables to improve pseudo-outcome imputation. We analyze the challenges inherent in utilizing post-treatment variables and establish a novel theoretical bound for ITE risk that explicitly connects post-treatment variables to ITE estimation accuracy. Unlike existing methods that ignore these variables or impose restrictive assumptions, PIPCFR learns effective representations that preserve informative components while mitigating bias. Empirical evaluations on both real-world and simulated datasets demonstrate that PIPCFR achieves significantly lower ITE errors compared to existing methods.
翻译:个体处理效应(ITE)的估计旨在预测因处理变化所导致的结果变化。在观测数据中,一个根本性挑战在于,尽管我们需要推断替代处理下的结果差异,但我们只能观测到每个个体在单一处理下的结果。现有方法通过使用推断的伪结果进行训练或创建匹配的实例对来解决这一局限性。然而,近期研究在很大程度上忽视了处理后变量对结果的潜在影响。这种忽视导致现有方法无法完全捕捉结果变异性,从而增加了反事实预测的方差。本文提出了用于反事实回归的基于处理后变量的伪结果插补方法(PIPCFR),这是一种通过纳入处理后变量来改进伪结果插补的新方法。我们分析了利用处理后变量所固有的挑战,并建立了一个新的ITE风险理论界,该理论界明确地将处理后变量与ITE估计精度联系起来。与那些忽略这些变量或施加严格假设的现有方法不同,PIPCFR学习有效的表示,在保留信息性成分的同时减轻偏差。在真实世界和模拟数据集上的实证评估表明,与现有方法相比,PIPCFR实现了显著更低的ITE误差。