The model-X conditional randomization test is a generic framework for conditional independence testing, unlocking new possibilities to discover features that are conditionally associated with a response of interest while controlling type-I error rates. An appealing advantage of this test is that it can work with any machine learning model to design powerful test statistics. In turn, the common practice in the model-X literature is to form a test statistic using machine learning models, trained to maximize predictive accuracy with the hope to attain a test with good power. However, the ideal goal here is to drive the model (during training) to maximize the power of the test, not merely the predictive accuracy. In this paper, we bridge this gap by introducing, for the first time, novel model-fitting schemes that are designed to explicitly improve the power of model-X tests. This is done by introducing a new cost function that aims at maximizing the test statistic used to measure violations of conditional independence. Using synthetic and real data sets, we demonstrate that the combination of our proposed loss function with various base predictive models (lasso, elastic net, and deep neural networks) consistently increases the number of correct discoveries obtained, while maintaining type-I error rates under control.
翻译:模型-X有条件随机测试是有条件独立测试的通用框架,它为发现在控制类型I误差率的同时有条件地与兴趣反应相联系的特征提供了新的可能性。这一测试的一个吸引人的好处是,它可以与任何机器学习模型一起设计强大的测试统计数据。反过来,模型-X文献的常见做法是利用机器学习模型形成测试统计,经过培训,以最大限度地提高预测准确性,并希望能获得良好功率的测试。然而,理想的目标是推动模型(培训期间)最大限度地发挥测试的功率,而不仅仅是预测准确性。在本文件中,我们首次引入旨在明确改善模型-X测试功率的新模型适应计划,从而弥补这一差距。这是通过引入新的成本功能,目的是最大限度地利用测试统计来测量违反有条件独立性的情况。我们使用合成和真实数据集,证明我们拟议的损失功能与各种基础预测模型(例如,弹性网和深度神经网络)的结合,不断增加正确发现的数量,同时保持类型I的误率在控制下。