The Generative Adversarial Network (GAN) has recently been applied to generate synthetic images from text. Despite significant advances, most current state-of-the-art algorithms are regular-grid region based; when attention is used, it is mainly applied between individual regular-grid regions and a word. These approaches are sufficient to generate images that contain a single object in its foreground, such as a "bird" or "flower". However, natural languages often involve complex foreground objects and the background may also constitute a variable portion of the generated image. Therefore, the regular-grid based image attention weights may not necessarily concentrate on the intended foreground region(s), which in turn, results in an unnatural looking image. Additionally, individual words such as "a", "blue" and "shirt" do not necessarily provide a full visual context unless they are applied together. For this reason, in our paper, we proposed a novel method in which we introduced an additional set of attentions between true-grid regions and word phrases. The true-grid region is derived using a set of auxiliary bounding boxes. These auxiliary bounding boxes serve as superior location indicators to where the alignment and attention should be drawn with the word phrases. Word phrases are derived from analysing Part-of-Speech (POS) results. We perform experiments on this novel network architecture using the Microsoft Common Objects in Context (MSCOCO) dataset and the model generates $256 \times 256$ conditioned on a short sentence description. Our proposed approach is capable of generating more realistic images compared with the current state-of-the-art algorithms.
翻译:创世的 Adversarial 网络( GAN) 最近被应用到从文本中生成合成图像。 尽管取得了显著的进步, 但目前大多数最新的最新图像算法都是基于常规电网的区域; 当使用注意时, 主要是在单个常规电网区域和一个单词之间应用。 这些方法足以生成含有其前景中单个对象的图像, 如“ 鸟” 或“ 花朵 ” 。 然而, 自然语言通常包含复杂的前景对象, 背景也可能构成生成的图像的变量部分。 因此, 常规电网图像的重心不一定集中在预定的地平面区域, 而这反过来又导致不自然的视觉区域。 此外, 单词如“ a”、“ bluue” 和“ shirth” 等, 不一定提供完整的视觉背景, 除非它们一起应用。 为此, 我们提出了一种新颖的方法, 我们在真实电网区域和词词句中引入了一组额外的关注点。 真实电网域区域使用一套辅助捆绑框, 。 这些缩缩图框作为高级的图像框框, 用来分析我们当前SO- preal- develmental- laveal- develal lavel lax the laveal lades the des the des the des des des des des des the des des the wedal- des des des des des the des des des des des des the des des des des des des des des des des the des des des the lautut the des des the des the des des des the des the des the des des laut the laut the laut the des des the laut the laut the sal lautdal lautdal laut the laut the lautdal- laut the sal- lautdal- lautdal- des des des lautdal- ladal- ladal- ladal- ladal- ladal- lader ladal- ladal- des ladal-