We present a training-free framework for style-personalized image generation that operates during inference using a scale-wise autoregressive model. Our method generates a stylized image guided by a single reference style while preserving semantic consistency and mitigating content leakage. Through a detailed step-wise analysis of the generation process, we identify a pivotal step where the dominant singular values of the internal feature encode style-related components. Building upon this insight, we introduce two lightweight control modules: Principal Feature Blending, which enables precise modulation of style through SVD-based feature reconstruction, and Structural Attention Correction, which stabilizes structural consistency by leveraging content-guided attention correction across fine stages. Without any additional training, extensive experiments demonstrate that our method achieves competitive style fidelity and prompt fidelity compared to fine-tuned baselines, while offering faster inference and greater deployment flexibility.
翻译:我们提出了一种在推理阶段运行的、基于尺度自回归模型的无训练风格个性化图像生成框架。该方法通过单张参考风格图像引导生成风格化图像,同时保持语义一致性并减轻内容泄漏。通过对生成过程的逐步详细分析,我们发现了一个关键步骤:内部特征的主导奇异值编码了风格相关成分。基于这一洞见,我们引入了两个轻量级控制模块:主特征融合(通过基于SVD的特征重建实现精确的风格调制)和结构注意力校正(通过跨精细阶段的内容引导注意力校正来稳定结构一致性)。在无需额外训练的情况下,大量实验表明,与微调基线相比,我们的方法在风格保真度和提示保真度方面达到竞争性水平,同时提供更快的推理速度和更高的部署灵活性。