BideDPO：基于文本与条件同步对齐的条件图像生成 (BideDPO: Conditional Image Generation with Simultaneous Text and Condition Alignment)

Conditional image generation enhances text-to-image synthesis with structural, spatial, or stylistic priors, but current methods face challenges in handling conflicts between sources. These include 1) input-level conflicts, where the conditioning image contradicts the text prompt, and 2) model-bias conflicts, where generative biases disrupt alignment even when conditions match the text. Addressing these conflicts requires nuanced solutions, which standard supervised fine-tuning struggles to provide. Preference-based optimization techniques like Direct Preference Optimization (DPO) show promise but are limited by gradient entanglement between text and condition signals and lack disentangled training data for multi-constraint tasks. To overcome this, we propose a bidirectionally decoupled DPO framework (BideDPO). Our method creates two disentangled preference pairs-one for the condition and one for the text-to reduce gradient entanglement. The influence of pairs is managed using an Adaptive Loss Balancing strategy for balanced optimization. We introduce an automated data pipeline to sample model outputs and generate conflict-aware data. This process is embedded in an iterative optimization strategy that refines both the model and the data. We construct a DualAlign benchmark to evaluate conflict resolution between text and condition. Experiments show BideDPO significantly improves text success rates (e.g., +35%) and condition adherence. We also validate our approach using the COCO dataset. Project Pages: https://limuloo.github.io/BideDPO/.

翻译：条件图像生成通过引入结构、空间或风格先验来增强文本到图像的合成能力，但现有方法在处理多源输入间的冲突时面临挑战。这些冲突主要包括：1）输入级冲突，即条件图像与文本提示相互矛盾；2）模型偏置冲突，即生成模型的固有偏置会破坏对齐效果，即使条件与文本内容一致。解决这些冲突需要精细化的方案，而传统的监督微调方法难以有效应对。基于偏好的优化技术（如直接偏好优化DPO）展现出潜力，但受限于文本与条件信号间的梯度纠缠，且缺乏面向多约束任务的解耦训练数据。为此，我们提出了一种双向解耦的DPO框架（BideDPO）。该方法构建了两组解耦的偏好对——一组针对条件，另一组针对文本——以减轻梯度纠缠，并通过自适应损失平衡策略动态调节偏好对的影响，实现均衡优化。我们设计了一套自动化数据流水线，用于采样模型输出并生成冲突感知数据，该流程嵌入迭代优化策略中，同步优化模型与数据质量。为评估文本与条件间的冲突化解能力，我们构建了DualAlign基准测试集。实验表明，BideDPO显著提升了文本成功率（例如提升35%）与条件遵循度。我们还在COCO数据集上验证了该方法的有效性。项目页面：https://limuloo.github.io/BideDPO/。