Many current applications in data science need rich model classes to adequately represent the statistics that may be driving the observations. But rich model classes may be too complex to admit estimators that converge to the truth with convergence rates that can be uniformly bounded over the entire collection of probability distributions comprising the model class, i.e. it may be impossible to guarantee uniform consistency of such estimators as the sample size increases. In such cases, it is conventional to settle for estimators with guarantees on convergence rate where the performance can be bounded in a model-dependent way, i.e. pointwise consistent estimators. But this viewpoint has the serious drawback that estimator performance is a function of the unknown model within the model class that is being estimated, and is therefore unknown. Even if an estimator is consistent, how well it is doing at any given time may not be clear, no matter what the sample size of the observations. Departing from the classical uniform/pointwise consistency dichotomy that leads to this impasse, a new analysis framework is explored by studying rich model classes that may only admit pointwise consistency guarantees, yet all the information about the unknown model driving the observations that is needed to gauge estimator accuracy can be inferred from the sample at hand. We expect that this data-derived estimation framework will be broadly applicable to a wide range of estimation problems by providing a methodology to deal with much richer model classes. In this paper we analyze the lossless compression problem in detail in this novel data-derived framework.
翻译:数据科学的许多当前应用需要丰富的模型类别,以充分代表可能导致观测的统计。但丰富的模型类别可能过于复杂,无法接受与正由模型类别组成的整个概率分布收集一致的、可统一约束于由模型类别组成的整个概率分布收集的、与真理趋同率相趋同的估算者,也就是说,由于样本规模的增加,可能无法保证估算者的统一一致性。在这种情况下,确定具有一致性的估算者,保证性能能够以依赖模型的方式约束趋同率,即,点性一致的估算者。但是,这种观点可能过于复杂,因为估计性能表现是正在估计的模型类别中未知的模型模型的函数,因此是未知的。即使估算者一致,在任何特定时间都可能无法保证这种估算的一致性,无论观察的样本大小如何,都是正常的。从典型的统一/点性一致性的对等分辨,通过研究可能只承认点一致性的模型类别,从而探索新的分析框架。在这种精确度的模型中,我们需要从这一未知的精确度的精确度的模型到模型的精确度的模型的估算方法。