Artistic style includes both structural and appearance elements. Existing neural stylization techniques primarily focus on transferring appearance features such as color and texture, often neglecting the equally crucial aspect of structural stylization. In this paper, we present a comprehensive study on the simultaneous stylization of structure and appearance of 2D images. Specifically, we introduce DiffArtist, which, to the best of our knowledge, is the first stylization method to allow for dual controllability over structure and appearance. Our key insight is to represent structure and appearance as separate diffusion processes to achieve complete disentanglement without requiring any training, thereby endowing users with unprecedented controllability for both components. The evaluation of stylization of both appearance and structure, however, remains challenging as it necessitates semantic understanding. To this end, we further propose a Multimodal LLM-based style evaluator, which better aligns with human preferences than metrics lacking semantic understanding. With this powerful evaluator, we conduct extensive analysis, demonstrating that DiffArtist achieves superior style fidelity, editability, and structure-appearance disentanglement. These merits make DiffArtist a highly versatile solution for creative applications. Project homepage: https://github.com/songrise/Artist.
翻译:暂无翻译