在错误规定的密度估计和后勤回归方面风险超大、风险最高且风险过大且估计不当的不适当估计者 (An improper estimator with optimal excess risk in misspecified density estimation and logistic regression)

We introduce a procedure for conditional density estimation under logarithmic loss, which we call SMP (Sample Minmax Predictor). This estimator minimizes a new general excess risk bound for statistical learning. On standard examples, this bound scales as $d/n$ with $d$ the model dimension and $n$ the sample size, and critically remains valid under model misspecification. Being an improper (out-of-model) procedure, SMP improves over within-model estimators such as the maximum likelihood estimator, whose excess risk degrades under misspecification. Compared to approaches reducing to the sequential problem, our bounds remove suboptimal $\log n$ factors and can handle unbounded classes. For the Gaussian linear model, the predictions and risk bound of SMP are governed by leverage scores of covariates, nearly matching the optimal risk in the well-specified case without conditions on the noise variance or approximation error of the linear model. For logistic regression, SMP provides a non-Bayesian approach to calibration of probabilistic predictions relying on virtual samples, and can be computed by solving two logistic regressions. It achieves a non-asymptotic excess risk of $O((d + B^2R^2)/n)$, where $R$ bounds the norm of features and $B$ that of the comparison parameter; by contrast, no within-model estimator can achieve better rate than $\min({B R}/{\sqrt{n}}, {d e^{BR}}/{n} )$ in general. This provides a more practical alternative to Bayesian approaches, which require approximate posterior sampling, thereby partly addressing a question raised by Foster et al. (2018).

翻译：我们引入了一种在对数损失 { 对数损失下进行有条件密度估计的程序, 我们称之为 { 最高可能性估测器 { (Sample Minmax 预测器 ) 。这个估测器将新的一般超额风险降到最低, 供统计学习之用。在标准示例中, 这个约束比例为美元/ 美元, 模型尺寸为美元, 样本大小为美元, 且在模型区分错误的情况下仍然非常有效。作为不适当的( 模型外) 程序, SMP 改善了模型内估测器( 最大可能性 { 估测器 ), 其超额风险会降低定额。与减少连续问题的方法相比, 我们的界限可以消除低于最优美的 $( n) 的超值。对于高斯线模型来说, SMP 的预测和风险约束值受杠杆值的制约, 几乎与精度案例中的最佳风险不匹配, 而对于线性模型的噪音差异或近差错误。对于物流回归, SMP 提供了一种非巴耶方法, 来校准的精确的预测。