MUG: 交互式多模式基于用户界面 (MUG: Interactive Multimodal Grounding on User Interfaces) - 专知论文

会员服务 ·

0

INTERACT · 多峰值 · Agent · MoDELS · 在线 ·

2022 年 9 月 29 日

MUG: Interactive Multimodal Grounding on User Interfaces

翻译：MUG: 交互式多模式基于用户界面

Tao Li,Gang Li,Jingjie Zheng,Purple Wang,Yang Li

We present MUG, a novel interactive task for multimodal grounding where a user and an agent work collaboratively on an interface screen. Prior works modeled multimodal UI grounding in one round: the user gives a command and the agent responds to the command. Yet, in a realistic scenario, a user command can be ambiguous when the target action is inherently difficult to articulate in natural language. MUG allows multiple rounds of interactions such that upon seeing the agent responses, the user can give further commands for the agent to refine or even correct its actions. Such interaction is critical for improving grounding performances in real-world use cases. To investigate the problem, we create a new dataset that consists of 77,820 sequences of human user-agent interaction on mobile interfaces in which 20% involves multiple rounds of interactions. To establish our benchmark, we experiment with a range of modeling variants and evaluation strategies, including both offline and online evaluation-the online strategy consists of both human evaluation and automatic with simulators. Our experiments show that allowing iterative interaction significantly improves the absolute task completion by 18% over the entire test dataset and 31% over the challenging subset. Our results lay the foundation for further investigation of the problem.

翻译：我们提出MUG,这是在用户和代理商在界面屏幕上合作工作的地方进行多式联运的新型互动任务。先前的模拟多式联运界面以一回合为基础: 用户发出指令, 代理商响应指令。然而, 在现实的情景下, 当目标行动本身难以以自然语言表达时, 用户指令可能会含糊不清。 MUG允许进行多轮互动, 这样在看到代理方反应后, 用户可以给代理商进一步指令, 以完善甚至纠正其行动。这种互动对于改善真实世界使用案例中的地面性能至关重要。为了调查问题, 我们创建了一套由77 820个人类用户和代理商互动序列组成的新数据集, 在其中20%涉及多轮互动的移动界面上, 由人类用户和代理商互动构成。为了确定我们的基准, 我们试验了一系列模型变式和评价战略, 包括离线和在线评价战略, 由人的评价和自动模拟器组成。我们的实验显示, 允许迭代互动大大改进了真实任务完成整个测试数据集的绝对任务, 18 % 和具有挑战性的子调查的31 % 。

1

相关内容

INTERACT

IFIP TC13 Conference on Human-Computer Interaction是人机交互领域的研究者和实践者展示其工作的重要平台。多年来，这些会议吸引了来自几个国家和文化的研究人员。官网链接：http://interact2019.org/

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

炎症微环境下间充质干细胞调控肝癌干细胞干性维持的作用机制

国家自然科学基金

0+阅读 · 2015年12月31日

硝胺类炸药可见光诱化降解机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

间充质干细胞对肝癌发生中Kupffer细胞相关炎症反应的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

NG2细胞在未成熟脑惊厥性脑损伤神经环路形成中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

ASICs在肿瘤酸化微环境中对MDSCs抑制免疫活性的影响及其机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

重金属废水制备新型Ferrite/LDH纳米复合材料及其催化吸附机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

HDAC6对免疫细胞活性的调节和作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

Arxiv

0+阅读 · 2022年11月4日

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Arxiv

0+阅读 · 2022年11月4日

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation

Arxiv

0+阅读 · 2022年11月2日

Deep Multimodal Fusion for Generalizable Person Re-identification

Arxiv

0+阅读 · 2022年11月2日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

Memory Augmented Graph Neural Networks for Sequential Recommendation

Memory Augmented Graph Neural Networks for Sequential Recommendation

Arxiv

13+阅读 · 2019年12月26日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

14+阅读 · 2018年5月19日

VIP会员

文章信息

相关主题

相关VIP内容

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

【CVPR 2022】基于层次化视觉语言知识蒸馏的开放词汇单阶段检测，Improving Visual Grounding with Visual-Linguistic Verification and Iterative Reasoning

专知会员服务

7+阅读 · 2022年3月19日

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

高效可扩展图神经网络的研究进展，Recent Advances in Efficient and Scalable Graph Neural Networks

专知会员服务

78+阅读 · 2022年3月15日

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

社交网络上议题社群的公共焦虑研究，中国人民大学新闻学院塔娜讲师，第八届全国社会媒体处理大会SMP2019

专知会员服务

15+阅读 · 2019年10月23日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

机器学习入门的经验与建议

机器学习入门的经验与建议

专知会员服务

94+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【哈佛大学商学院课程Fall 2019】机器学习可解释性

【哈佛大学商学院课程Fall 2019】机器学习可解释性

专知会员服务

105+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

全球AI工具市场发展现状与趋势分析2025

自动驾驶地图：全流程综述与前沿进展

协同智能体：多智能体人工智能系统如何变革军事训练及其他领域

【NeurIPS2025】TITAN：一种面向轨迹感知的大规模 VQE 自适应参数冻结技术

相关资讯

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

【ICIG2021】Check out the hot new trailer of ICIG2021 Symposium6

中国图象图形学学会CSIG

2+阅读 · 2021年11月12日

【ICIG2021】Latest News & Announcements of the Plenary Talk1

【ICIG2021】Latest News & Announcements of the Plenary Talk1

中国图象图形学学会CSIG

0+阅读 · 2021年11月1日

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

会议交流 | IJCKG: International Joint Conference on Knowledge Graphs

开放知识图谱

0+阅读 · 2021年9月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

无监督元学习表示学习

无监督元学习表示学习

CreateAMind

27+阅读 · 2019年1月4日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

43+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

相关论文

Robustness of Fusion-based Multimodal Classifiers to Cross-Modal Content Dilutions

Arxiv

0+阅读 · 2022年11月4日

Modeling and Executing Production Processes with Capabilities and Skills using Ontologies and BPMN

Arxiv

0+阅读 · 2022年11月4日

Learning an Artificial Language for Knowledge-Sharing in Multilingual Translation

Arxiv

0+阅读 · 2022年11月2日

Deep Multimodal Fusion for Generalizable Person Re-identification

Arxiv

0+阅读 · 2022年11月2日

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Multi-Modal Graph Neural Network for Joint Reasoning on Vision and Scene Text

Arxiv

10+阅读 · 2020年3月31日

Memory Augmented Graph Neural Networks for Sequential Recommendation

Memory Augmented Graph Neural Networks for Sequential Recommendation

Arxiv

13+阅读 · 2019年12月26日

Differentiable Reasoning on Large Knowledge Bases and Natural Language

Arxiv

12+阅读 · 2019年12月17日

Explainable Reasoning over Knowledge Graphs for Recommendation

Arxiv

11+阅读 · 2018年11月12日

Diverse Image-to-Image Translation via Disentangled Representations

Diverse Image-to-Image Translation via Disentangled Representations

Arxiv

13+阅读 · 2018年8月2日

Ripple Network: Propagating User Preferences on the Knowledge Graph for Recommender Systems

Arxiv

14+阅读 · 2018年5月19日

相关基金

炎症微环境下间充质干细胞调控肝癌干细胞干性维持的作用机制

国家自然科学基金

0+阅读 · 2015年12月31日

硝胺类炸药可见光诱化降解机理研究

国家自然科学基金

0+阅读 · 2015年12月31日

mTOR功能性单倍体通过ERS-IRE1/α-JNK通路调控乳腺癌细胞药物敏感性的机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

IL-32/Integrins/FAK通路在肝纤维化形成中的作用研究

国家自然科学基金

0+阅读 · 2013年12月31日

间充质干细胞对肝癌发生中Kupffer细胞相关炎症反应的作用及机制

国家自然科学基金

0+阅读 · 2013年12月31日

NG2细胞在未成熟脑惊厥性脑损伤神经环路形成中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

ASICs在肿瘤酸化微环境中对MDSCs抑制免疫活性的影响及其机制的研究

国家自然科学基金

0+阅读 · 2012年12月31日

重金属废水制备新型Ferrite/LDH纳米复合材料及其催化吸附机理研究

国家自然科学基金

0+阅读 · 2012年12月31日

Reality-based Interaction用户界面模型和评估方法研究

国家自然科学基金

0+阅读 · 2011年12月31日

HDAC6对免疫细胞活性的调节和作用机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员