线上被贬低的拉斯索(Lasso) (Online Debiased Lasso)

from arxiv, Ruijian Han and Lan Luo contributed equally to this work. Co-corresponding authors: Yuanyuan Lin (Email: ylin@sta.cuhk.edu.hk) and Jian Huang (Email: jian-huang@uiowa.edu)

We propose an online debiased lasso (ODL) method for statistical inference in high-dimensional linear models with streaming data. The proposed ODL consists of an efficient computational algorithm for streaming data and approximately normal estimators for the regression coefficients. Its implementation only requires the availability of the current data batch in the data stream and sufficient statistics of the historical data at each stage of the analysis. A new dynamic procedure is developed to select and update the tuning parameters upon the arrival of each new data batch so that we can adjust the amount of regularization adaptively along the data stream. The asymptotic normality of the ODL estimator is established under the conditions similar to those in an offline setting and mild conditions on the size of data batches in the stream, which provides theoretical justification for the proposed online statistical inference procedure. We conduct extensive numerical experiments to evaluate the performance of ODL. These experiments demonstrate the effectiveness of our algorithm and support the theoretical results. An air quality dataset is analyzed to illustrate the application of the proposed method.

翻译：我们建议采用在线去偏差的 lasso (ODL) 方法,用于对带有流数据的高维线性模型进行统计推断。提议的ODL 方法包括数据流的有效计算算法和回归系数的大致正常估计值。实施该方法只需要在数据流的每一阶段提供当前数据批量和对历史数据的充足统计数据。我们开发了新的动态程序,在每批新数据到达时选择和更新调试参数,以便我们能够在数据流中根据适应性调整正规化的数量。ODL 估计值的无症状正常性是在与流数据批量的离线设置和温和条件相类似的条件下建立的,为拟议的在线统计推理程序提供了理论依据。我们进行了广泛的数字实验,以评价ODL的性能。这些实验证明了我们的算法的有效性并支持理论结果。对空气质量数据集进行了分析,以说明拟议方法的应用。