Eliminating geometric distortion in semantically important regions remains an intractable challenge in image retargeting. This paper presents Object-IR, a self-supervised architecture that reformulates image retargeting as a learning-based mesh warping optimization problem, where the mesh deformation is guided by object appearance consistency and geometric-preserving constraints. Given an input image and a target aspect ratio, we initialize a uniform rigid mesh at the output resolution and use a convolutional neural network to predict the motion of each mesh grid and obtain the deformed mesh. The retargeted result is generated by warping the input image according to the rigid mesh in the input image and the deformed mesh in the output resolution. To mitigate geometric distortion, we design a comprehensive objective function incorporating a) object-consistent loss to ensure that the important semantic objects retain their appearance, b) geometric-preserving loss to constrain simple scale transform of the important meshes, and c) boundary loss to enforce a clean rectangular output. Notably, our self-supervised paradigm eliminates the need for manually annotated retargeting datasets by deriving supervision directly from the input's geometric and semantic properties. Extensive evaluations on the RetargetMe benchmark demonstrate that our Object-IR achieves state-of-the-art performance, outperforming existing methods in quantitative metrics and subjective visual quality assessments. The framework efficiently processes arbitrary input resolutions (average inference time: 0.009s for 1024x683 resolution) while maintaining real-time performance on consumer-grade GPUs. The source code will soon be available at https://github.com/tlliao/Object-IR.
翻译:在图像重定向中,消除语义重要区域的几何畸变仍是一个棘手的挑战。本文提出Object-IR,一种自监督架构,将图像重定向重新表述为基于学习的网格形变优化问题,其中网格变形由物体外观一致性约束和几何保持约束共同引导。给定输入图像和目标宽高比,我们在输出分辨率下初始化一个均匀刚性网格,并利用卷积神经网络预测每个网格格点的运动,从而获得变形后的网格。重定向结果通过根据输入图像中的刚性网格和输出分辨率下的变形网格对输入图像进行扭曲而生成。为减轻几何畸变,我们设计了一个综合目标函数,包含:a) 物体一致性损失,以确保重要语义物体保持其外观;b) 几何保持损失,以约束重要网格的简单尺度变换;c) 边界损失,以强制输出为整洁的矩形。值得注意的是,我们的自监督范式通过直接从输入图像的几何与语义属性中获取监督信号,无需依赖人工标注的重定向数据集。在RetargetMe基准上的广泛评估表明,我们的Object-IR实现了最先进的性能,在定量指标和主观视觉质量评估上均优于现有方法。该框架能高效处理任意输入分辨率(1024x683分辨率的平均推理时间:0.009秒),并在消费级GPU上保持实时性能。源代码即将发布于https://github.com/tlliao/Object-IR。