MENLI: 基于自然语言推断的鲁棒评价指标 (MENLI: Robust Evaluation Metrics from Natural Language Inference) - 专知论文

会员服务 ·

0

评价指标 · 标准基 · 鲁棒 · 评价 · 基准测试 ·

2023 年 4 月 11 日

MENLI: Robust Evaluation Metrics from Natural Language Inference

翻译：MENLI: 基于自然语言推断的鲁棒评价指标

Yanran Chen,Steffen Eger

from arxiv, TACL 2023 Camera-ready; github link fixed+Fig.3 legend fixed

Recently proposed BERT-based evaluation metrics for text generation perform well on standard benchmarks but are vulnerable to adversarial attacks, e.g., relating to information correctness. We argue that this stems (in part) from the fact that they are models of semantic similarity. In contrast, we develop evaluation metrics based on Natural Language Inference (NLI), which we deem a more appropriate modeling. We design a preference-based adversarial attack framework and show that our NLI based metrics are much more robust to the attacks than the recent BERT-based metrics. On standard benchmarks, our NLI based metrics outperform existing summarization metrics, but perform below SOTA MT metrics. However, when combining existing metrics with our NLI metrics, we obtain both higher adversarial robustness (15%-30%) and higher quality metrics as measured on standard benchmarks (+5% to 30%).

翻译：最近提出的基于BERT的文本生成评价指标在标准基准测试中表现良好，但容易受到对信息正确性的敌对攻击。我们认为这在一定程度上源于它们是语义相似性模型。相比之下，我们开发了基于自然语言推断（NLI）的评价指标，认为这是更合适的建模方法。我们设计了一个基于偏好的敌对攻击框架，并证明我们基于NLI的指标比最近的基于BERT的指标更加鲁棒。在标准基准测试中，我们的NLI指标优于现有的摘要指标，但低于最先进的机器翻译指标。然而，当将现有指标与我们的NLI指标相结合时，我们获得了更高的敌对鲁棒性（15％-30％）和在标准基准测试中测量的更高质量指标（+5％至30％）。

0

相关内容

评价指标

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日

【ICML2021】反事实生成模型的语言

专知会员服务

18+阅读 · 2021年9月17日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

14+阅读 · 2018年2月18日

【论文推荐】最新7篇变分自编码器（VAE）相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

【论文推荐】最新7篇变分自编码器（VAE）相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

专知

11+阅读 · 2018年2月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

考虑不确定性和方向性的结构随机极值和疲劳风致响应及抗风可靠性评价理论

国家自然科学基金

0+阅读 · 2014年12月31日

基于多模态MRI的神经节苷酯对鼻咽癌放射性脑损伤早期干预疗效的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

密肋复合墙结构基于贝叶斯理论的恢复力模型参数识别及动力可靠度分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于结构损伤模型的疲劳寿命与可靠性计算高精求解方法

国家自然科学基金

0+阅读 · 2012年12月31日

Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets

Arxiv

0+阅读 · 2023年5月28日

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

Arxiv

0+阅读 · 2023年5月26日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation

Arxiv

12+阅读 · 2022年10月21日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

VIP会员

文章信息

相关主题

相关VIP内容

NeurlPS 2022 | 自然语言处理相关论文分类整理

NeurlPS 2022 | 自然语言处理相关论文分类整理

专知会员服务

51+阅读 · 2022年10月2日

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

【俄亥俄州立大学学生论文】鲁棒自然语言理解，74页pdf，Towards More Robust Natural Language Understanding

专知会员服务

19+阅读 · 2022年3月1日

【ICML2021】反事实生成模型的语言

专知会员服务

18+阅读 · 2021年9月17日

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

最新《人脸识别对抗攻击》综述 | Threat of Adversarial Attacks on Face Recognition: A Comprehensive Survey

专知会员服务

26+阅读 · 2020年7月24日

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

【ACL2020-Google】学习鲁棒度量的文本生成，BLEURT: Learning Robust Metrics for Text Generation

专知会员服务

17+阅读 · 2020年4月10日

热门VIP内容

开通专知VIP会员享更多权益服务

《利用人工智能对军事行动进行建模》

《利用人工智能学习、优化与推演美国海军作战部队的战略布局与分散（续文）》

机器人、无人机与实时影像：应对城市爆炸威胁的三大技术方案

《指挥官意图消息中关键概念自动提取》最新47页

相关资讯

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

【ACL2020放榜!】事件抽取、关系抽取、NER、Few-Shot 相关论文整理

深度学习自然语言处理

18+阅读 · 2020年5月22日

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

【论文推荐】最新六篇图像描述生成相关论文—字符级推断、视觉解释、语义对齐、实体感知、确定性非自回归

专知

15+阅读 · 2018年5月28日

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

【论文推荐】最新六篇视频分类相关论文—层次标签推断、知识图谱、CNNs、DAiSEE、表观和关系网络、转移学习

专知

14+阅读 · 2018年2月18日

【论文推荐】最新7篇变分自编码器（VAE）相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

【论文推荐】最新7篇变分自编码器（VAE）相关论文—汉语诗歌、生成模型、跨模态、MR图像重建、机器翻译、推断、合成人脸

专知

11+阅读 · 2018年2月12日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

Tri-level Joint Natural Language Understanding for Multi-turn Conversational Datasets

Arxiv

0+阅读 · 2023年5月28日

RQUGE: Reference-Free Metric for Evaluating Question Generation by Answering the Question

Arxiv

0+阅读 · 2023年5月26日

Augmented Large Language Models with Parametric Knowledge Guiding

Arxiv

20+阅读 · 2023年5月8日

A Survey on Graph Counterfactual Explanations: Definitions, Methods, Evaluation

Arxiv

12+阅读 · 2022年10月21日

Towards Robust Visual Information Extraction in Real World: New Dataset and Novel Solution

Arxiv

10+阅读 · 2021年1月24日

相关基金

考虑不确定性和方向性的结构随机极值和疲劳风致响应及抗风可靠性评价理论

国家自然科学基金

0+阅读 · 2014年12月31日

基于多模态MRI的神经节苷酯对鼻咽癌放射性脑损伤早期干预疗效的研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于SURE/PURE准则的图像盲反卷积算法研究

国家自然科学基金

3+阅读 · 2013年12月31日

密肋复合墙结构基于贝叶斯理论的恢复力模型参数识别及动力可靠度分析

国家自然科学基金

0+阅读 · 2012年12月31日

基于结构损伤模型的疲劳寿命与可靠性计算高精求解方法

国家自然科学基金

0+阅读 · 2012年12月31日

微信扫码咨询专知VIP会员