Assessing advertisements, specifically on the basis of user preferences and ad quality, is crucial to the marketing industry. Although recent studies have attempted to use deep neural networks for this purpose, these studies have not utilized image-related auxiliary attributes, which include embedded text frequently found in ad images. We, therefore, investigated the influence of these attributes on ad image preferences. First, we analyzed large-scale real-world ad log data and, based on our findings, proposed a novel multi-step modality fusion network (M2FN) that determines advertising images likely to appeal to user preferences. Our method utilizes auxiliary attributes through multiple steps in the network, which include conditional batch normalization-based low-level fusion and attention-based high-level fusion. We verified M2FN on the AVA dataset, which is widely used for aesthetic image assessment, and then demonstrated that M2FN can achieve state-of-the-art performance in preference prediction using a real-world ad dataset with rich auxiliary attributes.
翻译:评估广告,特别是根据用户偏好和广告质量评估广告,对于营销业至关重要。虽然最近的研究试图为此目的利用深层神经网络,但这些研究没有利用与图像有关的辅助属性,其中包括广告图像中经常发现的嵌入文字。因此,我们调查了这些属性对图像偏好的影响。首先,我们分析了大规模真实世界登日志数据,并根据我们的调查结果,提出了一个新的多步骤混合模式网络(M2FN),确定可能吸引用户喜好广告的图像。我们的方法通过网络中的多个步骤利用辅助属性,其中包括有条件的批量标准化低水平聚和关注度高水平聚变。我们在AVA数据集上验证了M2FN,该数据集广泛用于美学图像评估,然后证明M2FN能够利用具有丰富辅助属性的真实世界数据集实现最先进的偏好预测业绩。