We study community detection in the \emph{symmetric $k$-stochastic block model}, where $n$ nodes are evenly partitioned into $k$ clusters with intra- and inter-cluster connection probabilities $p$ and $q$, respectively. Our main result is a polynomial-time algorithm that achieves the minimax-optimal misclassification rate \begin{equation*} \exp \Bigl(-\bigl(1 \pm o(1)\bigr) \tfrac{C}{k}\Bigr), \quad \text{where } C = (\sqrt{pn} - \sqrt{qn})^2, \end{equation*} whenever $C \ge K\,k^2\,\log k$ for some universal constant $K$, matching the Kesten--Stigum (KS) threshold up to a $\log k$ factor. Notably, this rate holds even when an adversary corrupts an $η\le \exp\bigl(- (1 \pm o(1)) \tfrac{C}{k}\bigr)$ fraction of the nodes. To the best of our knowledge, the minimax rate was previously only attainable either via computationally inefficient procedures [ZZ15] or via polynomial-time algorithms that require strictly stronger assumptions such as $C \ge K k^3$ [GMZZ17]. In the node-robust setting, the best known algorithm requires the substantially stronger condition $C \ge K k^{102}$ [LM22]. Our results close this gap by providing the first polynomial-time algorithm that achieves the minimax rate near the KS threshold in both settings. Our work has two key technical contributions: (1) we robustify majority voting via the Sum-of-Squares framework, (2) we develop a novel graph bisection algorithm via robust majority voting, which allows us to significantly improve the misclassification rate to $1/\mathrm{poly}(k)$ for the initial estimation near the KS threshold.
翻译:我们研究\emph{对称k-随机块模型}中的社区检测问题,其中n个节点被均匀划分为k个簇,簇内和簇间的连接概率分别为p和q。我们的主要成果是提出了一种多项式时间算法,能够达到极小极大最优的误分类率:\begin{equation*} \exp \Bigl(-\bigl(1 \pm o(1)\bigr) \tfrac{C}{k}\Bigr), \quad \text{其中 } C = (\sqrt{pn} - \sqrt{qn})^2, \end{equation*} 该结果在$C \ge K\,k^2\,\log k$(K为通用常数)时成立,以$\log k$因子匹配了Kesten–Stigum (KS)阈值。值得注意的是,即使当对手破坏$η\le \exp\bigl(- (1 \pm o(1)) \tfrac{C}{k}\bigr)$比例的节点时,该速率依然保持。据我们所知,极小极大速率此前仅能通过计算低效的过程[ZZ15]或需要更强假设(如$C \ge K k^3$)的多项式时间算法[GMZZ17]实现。在节点鲁棒设置中,已知最佳算法需要显著更强的条件$C \ge K k^{102}$[LM22]。我们的研究通过提供首个在两种设置下均能在KS阈值附近达到极小大速率的多项式时间算法,填补了这一空白。本工作包含两项关键技术贡献:(1) 通过平方和框架实现多数投票的鲁棒化,(2) 开发了基于鲁棒多数投票的新型图二分算法,这使我们能够将KS阈值附近的初始估计误分类率显著提升至$1/\mathrm{poly}(k)$。