Machine learning (ML) methods are highly flexible, but their ability to approximate the true data-generating process is fundamentally constrained by finite samples. We characterize a universal lower bound, the Limits-to-Learning Gap (LLG), quantifying the unavoidable discrepancy between a model's empirical fit and the population benchmark. Recovering the true population $R^2$, therefore, requires correcting observed predictive performance by this bound. Using a broad set of variables, including excess returns, yields, credit spreads, and valuation ratios, we find that the implied LLGs are large. This indicates that standard ML approaches can substantially understate true predictability in financial data. We also derive LLG-based refinements to the classic Hansen and Jagannathan (1991) bounds, analyze implications for parameter learning in general-equilibrium settings, and show that the LLG provides a natural mechanism for generating excess volatility.
翻译:机器学习方法具有高度灵活性,但其逼近真实数据生成过程的能力从根本上受到有限样本的约束。我们刻画了一个普适下界——学习极限间隙,用于量化模型经验拟合与总体基准之间不可避免的差异。因此,恢复真实的总体$R^2$需要对观测到的预测性能进行该界限的校正。通过使用包括超额收益、收益率、信用利差和估值比率在内的广泛变量集,我们发现隐含的学习极限间隙值较大。这表明标准机器学习方法可能显著低估金融数据中的真实可预测性。我们还推导出基于学习极限间隙对经典Hansen和Jagannathan(1991)界限的改进,分析一般均衡设定中参数学习的启示,并证明学习极限间隙为生成超额波动性提供了自然机制。