最大宇宙采样问题最佳主要子矩阵选择:可缩放的算法和性能保障 (Best Principal Submatrix Selection for the Maximum Entropy Sampling Problem: Scalable Algorithms and Performance Guarantees)

This paper studies a classic maximum entropy sampling problem (MESP), which aims to select the most informative principal submatrix of a prespecified size from a covariance matrix. MESP has been widely applied to many areas, including healthcare, power system, manufacturing and data science. By investigating its Lagrangian dual and primal characterization, we derive a novel convex integer program for MESP and show that its continuous relaxation yields a near-optimal solution. The results motivate us to study an efficient sampling algorithm and develop its approximation bound for MESP, which improves the best-known bound in literature. We then provide an efficient deterministic implementation of the sampling algorithm with the same approximation bound. By developing new mathematical tools for the singular matrices and analyzing the Lagrangian dual of the proposed convex integer program, we investigate the widely-used local search algorithm and prove its first-known approximation bound for MESP. The proof techniques further inspire us with an efficient implementation of the local search algorithm. Our numerical experiments demonstrate that these approximation algorithms can efficiently solve medium-sized and large-scale instances to near-optimality. Our proposed algorithms are coded and released as open-source software. Finally, we extend the analyses to the A-Optimal MESP (A-MESP), where the objective is to minimize the trace of the inverse of the selected principal submatrix.

翻译：本文研究一个典型的最大的诱变抽样问题(MESP),目的是从一个常态矩阵中选择一个预定大小的最有信息的主要子矩阵(MESP),MESP被广泛应用于许多领域,包括保健、电力系统、制造和数据科学。通过调查它的Lagrangian双重和原始特征,我们为MESP开发了一个新颖的 convex整数程序,并证明它的持续放松产生了一种近乎最佳的解决办法。结果激励我们研究一个高效的抽样算法,并发展它为MESP所约束的近似值,从而改进最著名的文献约束。然后我们以同样的近似界限有效地确定执行抽样算法。通过开发用于单质矩阵的新数学工具并分析拟议的 convex整数方案的Lagrangian双数,我们调查了广泛使用的本地搜索算法,并证明它的第一个已知的近似近似值是当地搜索算法的高效实施。我们的数字实验表明,这些近似算法可以有效地解决中大尺度到近似易懂的事例。我们所选择的缩算法,我们所选择的A-A类主要算法是最终的软算法,这是我们所选择的开放的最小化的软算法,最后的A-A-A-A-A-A-A-A-A-A-lioraloraloralalalalalalalalalalalal的原始的原始分析,这是我们所最后的原始分析是向我们最后的公开的原始的原始分析。