We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism, thereby encoding both global and local style patterns. Further, the proposed transformer-based HWT comprises an encoder-decoder attention that enables style-content entanglement by gathering the style representation of each query character. To the best of our knowledge, we are the first to introduce a transformer-based generative network for styled handwritten text generation. Our proposed HWT generates realistic styled handwritten text images and significantly outperforms the state-of-the-art demonstrated through extensive qualitative, quantitative and human-based evaluations. The proposed HWT can handle arbitrary length of text and any desired writing style in a few-shot setting. Further, our HWT generalizes well to the challenging scenario where both words and writing style are unseen during training, generating realistic styled handwritten text images.
翻译:我们建议采用新型变压器式手写文字图像生成方法,即HWT,努力学习风格内容缠绕以及全球和地方写作风格模式。拟议的HWT通过自省机制捕捉风格示例中的长短距离关系,从而将全球和本地风格模式编码。此外,拟议的变压器式HWT包含一个编码器脱密器关注点,通过收集每个查询字符的风格表达方式,使风格内容缠绕能够进行风格-内容缠绕。据我们所知,我们首先为风格手写文本生成了基于变压器的基因化网络。我们提议的HWT生成了现实的手写手写文字图像,大大超越了通过广泛的定性、定量和基于人的评估所展示的艺术状态。拟议的HWT可以在几发环境中处理任意的文本长度和任何想要的写作风格。此外,我们的HWT概括了富有挑战性的情景,在培训期间,语言和写作风格都是看不见的,产生了现实的手写文字图像。