Amodal completion, the task of inferring invisible object parts, faces significant challenges in maintaining semantic consistency and structural integrity. Prior progressive approaches are inherently limited by inference instability and error accumulation. To tackle these limitations, we present a Collaborative Multi-Agent Reasoning Framework that explicitly decouples Semantic Planning from Visual Synthesis. By employing specialized agents for upfront reasoning, our method generates a structured, explicit plan before pixel generation, enabling visually and semantically coherent single-pass synthesis. We integrate this framework with two critical mechanisms: (1) a self-correcting Verification Agent that employs Chain-of-Thought reasoning to rectify visible region segmentation and identify residual occluders strictly within the Semantic Planning phase, and (2) a Diverse Hypothesis Generator that addresses the ambiguity of invisible regions by offering diverse, plausible semantic interpretations, surpassing the limited pixel-level variations of standard random seed sampling. Furthermore, addressing the limitations of traditional metrics in assessing inferred invisible content, we introduce the MAC-Score (MLLM Amodal Completion Score), a novel human-aligned evaluation metric. Validated against human judgment and ground truth, these metrics establish a robust standard for assessing structural completeness and semantic consistency with visible context. Extensive experiments demonstrate that our framework significantly outperforms state-of-the-art methods across multiple datasets. Our project is available at: https://fanhongxing.github.io/remac-page.
翻译:模态补全,即推断物体不可见部分的任务,在保持语义一致性和结构完整性方面面临重大挑战。先前渐进式方法固有地受到推理不稳定性和误差累积的限制。为应对这些局限,我们提出了一种协作多智能体推理框架,明确将语义规划与视觉合成解耦。通过采用专门智能体进行前期推理,我们的方法在像素生成前生成结构化、显式的规划,从而实现视觉与语义连贯的单次合成。我们将该框架与两个关键机制相结合:(1) 一个自校正验证智能体,采用思维链推理在语义规划阶段内严格修正可见区域分割并识别残留遮挡物;(2) 一个多样化假设生成器,通过提供多样且合理的语义解释来解决不可见区域的模糊性问题,超越了标准随机种子采样有限的像素级变化。此外,针对传统指标在评估推断不可见内容方面的局限,我们提出了MAC-Score(MLLM模态补全分数),一种新颖的类人评估指标。通过人类判断和真实标注的验证,这些指标建立了评估结构完整性和与可见上下文语义一致性的稳健标准。大量实验表明,我们的框架在多个数据集上显著优于现有最先进方法。项目地址:https://fanhongxing.github.io/remac-page。