Current image fusion methods struggle to address the composite degradations encountered in real-world imaging scenarios and lack the flexibility to accommodate user-specific requirements. In response to these challenges, we propose a controllable image fusion framework with language-vision prompts, termed ControlFusion, which adaptively neutralizes composite degradations. On the one hand, we develop a degraded imaging model that integrates physical imaging mechanisms, including the Retinex theory and atmospheric scattering principle, to simulate composite degradations, thereby providing potential for addressing real-world complex degradations from the data level. On the other hand, we devise a prompt-modulated restoration and fusion network that dynamically enhances features with degradation prompts, enabling our method to accommodate composite degradation of varying levels. Specifically, considering individual variations in quality perception of users, we incorporate a text encoder to embed user-specified degradation types and severity levels as degradation prompts. We also design a spatial-frequency collaborative visual adapter that autonomously perceives degradations in source images, thus eliminating the complete dependence on user instructions. Extensive experiments demonstrate that ControlFusion outperforms SOTA fusion methods in fusion quality and degradation handling, particularly in countering real-world and compound degradations with various levels. The source code is publicly available at https://github.com/Linfeng-Tang/ControlFusion.
翻译:当前图像融合方法难以处理真实成像场景中遇到的复合退化问题,且缺乏灵活性以适应用户的特定需求。针对这些挑战,我们提出了一种基于语言-视觉提示的可控图像融合框架,称为ControlFusion,其能够自适应地中和复合退化。一方面,我们构建了一个融合物理成像机制(包括Retinex理论和大气散射原理)的退化成像模型,以模拟复合退化,从而在数据层面为处理真实世界复杂退化提供了可能。另一方面,我们设计了一种提示调制的恢复与融合网络,该网络利用退化提示动态增强特征,使我们的方法能够适应不同级别的复合退化。具体而言,考虑到用户对质量感知的个体差异,我们引入文本编码器将用户指定的退化类型和严重程度嵌入为退化提示。同时,我们设计了一种空间-频率协同视觉适配器,能够自主感知源图像中的退化,从而降低对用户指令的完全依赖。大量实验表明,ControlFusion在融合质量和退化处理方面优于当前最先进的融合方法,特别是在应对不同级别的真实世界及复合退化方面表现突出。源代码已公开于 https://github.com/Linfeng-Tang/ControlFusion。