Massive data bring the big challenges of memory and computation for analysis. These challenges can be tackled by taking subsamples from the full data as a surrogate. For functional data, it is common to collect multiple measurements over their domains, which require even more memory and computation time when the sample size is large. The computation would be much more intensive when statistical inference is required through bootstrap samples. To the best of our knowledge, this article is the first attempt to study the subsampling method for the functional linear model. We propose an optimal subsampling method based on the functional L-optimality criterion. When the response is a discrete or categorical variable, we further extend our proposed functional L-optimality subsampling (FLoS) method to the functional generalized linear model. We establish the asymptotic properties of the estimators by the FLoS method. The finite sample performance of our proposed FLoS method is investigated by extensive simulation studies. The FLoS method is further demonstrated by analyzing two large-scale datasets: the global climate data and the kidney transplant data. The analysis results on these data show that the FLoS method is much better than the uniform subsampling approach and can well approximate the results based on the full data while dramatically reducing the computation time and memory.
翻译:大规模数据带来记忆和计算分析的巨大挑战。 这些挑战可以通过将完整数据中的子抽样作为替代数据来应对。 对于功能性数据, 通常的做法是收集其域的多重测量数据, 当样本大小较大时需要更多的内存和计算时间。 当需要通过靴子取样进行统计推断时, 计算将更加密集。 根据我们所知, 本文是首次尝试研究功能线性模型的子抽样方法。 我们根据功能性L- 优化标准提出一种最佳的子抽样方法。 当回复是离散或绝对变量时, 我们通常会进一步将拟议的功能性L- 最佳性亚抽样方法( FloS) 扩大到功能性通用线性模型。 我们用FLOS 方法建立估算器的统计性能。 我们拟议的 FLOS 方法的有限抽样性能通过广泛的模拟研究得到调查。 FLOS 方法通过分析两个大型数据集( 全球气候数据和肾脏移植数据是绝对变量变量), 我们的分析结果可以比这些数据更精确地显示, 常规- S 和精确地计算方法的精确地显示这些数据, 。