Complex image restoration aims to recover high-quality images from inputs affected by multiple degradations such as blur, noise, rain, and compression artifacts. Recent restoration agents, powered by vision-language models and large language models, offer promising restoration capabilities but suffer from significant efficiency bottlenecks due to reflection, rollback, and iterative tool searching. Moreover, their performance heavily depends on degradation recognition models that require extensive annotations for training, limiting their applicability in label-free environments. To address these limitations, we propose a policy optimization-based restoration framework that learns an lightweight agent to determine tool-calling sequences. The agent operates in a sequential decision process, selecting the most appropriate restoration operation at each step to maximize final image quality. To enable training within label-free environments, we introduce a novel reward mechanism driven by multimodal large language models, which act as human-aligned evaluator and provide perceptual feedback for policy improvement. Once trained, our agent executes a deterministic restoration plans without redundant tool invocations, significantly accelerating inference while maintaining high restoration quality. Extensive experiments show that despite using no supervision, our method matches SOTA performance on full-reference metrics and surpasses existing approaches on no-reference metrics across diverse degradation scenarios.
翻译:复杂图像修复旨在从受多种退化(如模糊、噪声、雨纹和压缩伪影)影响的输入中恢复高质量图像。近期基于视觉语言模型和大语言模型的修复智能体展现出有前景的修复能力,但由于反思、回滚和迭代工具搜索等机制,存在显著的效率瓶颈。此外,其性能严重依赖需要大量标注数据进行训练的退化识别模型,这限制了它们在无标签环境下的适用性。为应对这些局限性,我们提出一种基于策略优化的修复框架,通过学习训练一个轻量级智能体来决策工具调用序列。该智能体在序列决策过程中运行,每一步选择最合适的修复操作以最大化最终图像质量。为实现无标签环境下的训练,我们引入一种由多模态大语言模型驱动的新型奖励机制,该模型充当人类对齐的评估器,为策略改进提供感知反馈。训练完成后,我们的智能体执行确定性的修复计划而无需冗余工具调用,在保持高修复质量的同时显著加速推理过程。大量实验表明,尽管未使用任何监督,我们的方法在全参考指标上达到SOTA性能,并在多种退化场景下的无参考指标上超越现有方法。