Convolutional analysis operator learning (CAOL) enables the unsupervised training of (hierachical) convolutional sparsifying operators or autoencoders from large datasets. One can use many training images for CAOL, but a precise understanding of the impact of doing so has remained an open question. This paper presents a series of results that lend insight into the impact of dataset size on the filter update in CAOL. The first result is a general deterministic bound on errors in the estimated filters that then leads to two specific bounds under particular random models. The first bound illustrates a decrease in the expected filter estimation error as the number of training samples increases, and the second bound provides high probability analogues. The bounds depend on properties of the training data, and we investigate their empirical values with real data. Taken together, these results provide evidence for the potential benefit of using more training data in CAOL.
翻译:革命分析操作员学习( CAOL) 使得能够对大型数据集中的(危险)进化操作员或自动编译员进行不受监督的培训。 人们可以对CAOL使用许多培训图像,但准确理解这样做的影响仍然是一个未决问题。本文展示了一系列结果,有助于深入了解数据集大小对 CAOL 过滤器更新的影响。 第一种结果是对估计过滤器中的错误进行一般的确定性约束,从而导致在特定随机模型中出现两个具体界限。 第一种是表明随着培训样品数量的增加,预期过滤估计错误会减少,而第二种是提供高概率的类比。这些界限取决于培训数据的性质,我们用真实数据来调查它们的经验值。把这些结果结合起来,为在CAOL使用更多培训数据的潜在好处提供了证据。