The widely used Fact-based Visual Question Answering (FVQA) dataset contains visually-grounded questions that require information retrieval using common sense knowledge graphs to answer. It has been observed that the original dataset is highly imbalanced and concentrated on a small portion of its associated knowledge graph. We introduce FVQA 2.0 which contains adversarial variants of test questions to address this imbalance. We show that systems trained with the original FVQA train sets can be vulnerable to adversarial samples and we demonstrate an augmentation scheme to reduce this vulnerability without human annotations.
翻译:广泛使用的FVQA数据集包含需要使用常识知识图检索的视觉问句来回答的问题。观察到原数据集极度失衡,并且集中在其相关知识图的一小部分上。我们引入FVQA 2.0,其中包含测试问题的对抗变体,以解决这种不平衡问题。我们展示了用原始FVQA训练集训练的系统可能容易受到对抗样本的攻击,并展示了一种增强方案,以减少这种易受攻击性,而无需人类注释。