We consider the problem of evaluating the performance of a decision policy using past observational data. The outcome of a policy is measured in terms of a loss or disutility (or negative reward) and the problem is to draw valid inferences about the out-of-sample loss of the specified policy when the past data is observed under a, possibly unknown, policy. Using a sample-splitting method, we show that it is possible to draw such inferences with finite-sample coverage guarantees that evaluate the entire loss distribution. Importantly, the method takes into account model misspecifications of the past policy -- including unmeasured confounding. The evaluation method can be used to certify the performance of a policy using observational data under an explicitly specified range of credible model assumptions.
翻译:我们考虑了使用过去观测数据评估决策政策绩效的问题。一项政策的结果是以损失或无效(或消极奖励)来衡量的,问题在于,在根据可能未知的政策观察过去的数据时,对特定政策的非典型损失作出合理的推论。我们采用抽样分解方法,表明有可能用评估整个损失分布的有限抽样保险担保作出这种推论。重要的是,这种方法考虑到过去政策的模型特征 -- -- 包括未计量的混杂。评估方法可用于证明政策的执行情况,在明确指定的一系列可靠模型假设下使用观测数据。