In this manuscript, a purely data driven statistical regularization method is proposed for extracting the information from big data with randomly distributed noise. Since the variance of the noise maybe large, the method can be regarded as a general data preprocessing method in ill-posed problems, which is able to overcome the difficulty that the traditional regularization method unable to solve, and has superior advantage in computing efficiency. The unique solvability of the method is proved and a number of conditions are given to characterize the solution. The regularization parameter strategy is discussed and the rigorous upper bound estimation of confidence interval of the error in $L^2$ norm is established. Some numerical examples are provided to illustrate the appropriateness and effectiveness of the method.
翻译:在本手稿中,建议采用一种纯粹以数据驱动的统计正规化方法,以随机散布噪音的方式从大数据中提取信息;由于噪音的差异可能很大,该方法可被视为是处理不良问题的一般数据预处理方法,能够克服传统正规化方法无法解决的困难,在计算效率方面具有优势;该方法的独特可溶性得到证明,并规定了一些条件来说明解决办法的特点;讨论了正规化参数战略,并确定了以$L$2美元标准计算错误的严格上限估计信任度;提供了一些数字例子,以说明该方法是否适当和有效。