AQ-GT: 一个用于共语手势合成的时间对齐量化GRU-Transformer模型 (AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis) - 专知论文

会员服务 ·

0

GRU · Transformer模型 · 映射 · 智能代理 · 行为数据集 ·

2023 年 5 月 2 日

AQ-GT: a Temporally Aligned and Quantized GRU-Transformer for Co-Speech Gesture Synthesis

翻译：AQ-GT: 一个用于共语手势合成的时间对齐量化GRU-Transformer模型

Hendric Voß,Stefan Kopp

The generation of realistic and contextually relevant co-speech gestures is a challenging yet increasingly important task in the creation of multimodal artificial agents. Prior methods focused on learning a direct correspondence between co-speech gesture representations and produced motions, which created seemingly natural but often unconvincing gestures during human assessment. We present an approach to pre-train partial gesture sequences using a generative adversarial network with a quantization pipeline. The resulting codebook vectors serve as both input and output in our framework, forming the basis for the generation and reconstruction of gestures. By learning the mapping of a latent space representation as opposed to directly mapping it to a vector representation, this framework facilitates the generation of highly realistic and expressive gestures that closely replicate human movement and behavior, while simultaneously avoiding artifacts in the generation process. We evaluate our approach by comparing it with established methods for generating co-speech gestures as well as with existing datasets of human behavior. We also perform an ablation study to assess our findings. The results show that our approach outperforms the current state of the art by a clear margin and is partially indistinguishable from human gesturing. We make our data pipeline and the generation framework publicly available.

翻译：生成逼真且上下文相关的共语手势是一项具有挑战性但愈发重要的任务，这有助于创建多模态的人工智能代理。以往的方法集中于学习共语手势表示与产生动作之间的直接对应关系，这导致看似自然但在人类评估中常常不令人信服的手势。我们提出了一个方法来使用生成对抗网络和量化技术来预先训练部分手势序列。产生的码本向量既作为我们框架的输入也作为输出，从而形成手势的生成和重构的基础。通过学习潜空间表示的映射关系，而不是将其直接映射到向量表示，这个框架有助于生成高度逼真和富有表现力的手势，这些手势紧密地复制了人类的运动和行为，同时避免了生成过程中的人工痕迹。我们通过将其与已有的共语手势生成方法和现有的人类行为数据集进行比较来评估我们的方法。我们还进行了消融研究，以评估我们的发现。结果表明我们的方法在性能上明显优于当前技术水平，并且在某种程度上与人类手势难以区分。我们公开了我们的数据管道和生成框架。

0

相关内容

GRU

循环神经网络的一种门机制

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

[ICCV 2021] 联合视觉语义推理：文本识别的多级解码器

[ICCV 2021] 联合视觉语义推理：文本识别的多级解码器

专知会员服务

19+阅读 · 2021年11月28日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

八篇 ICCV 2019 【图神经网络（GNN）+CV】相关论文

八篇 ICCV 2019 【图神经网络（GNN）+CV】相关论文

专知会员服务

30+阅读 · 2020年1月10日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

泡泡机器人SLAM

11+阅读 · 2018年3月31日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

袖状胃切除术后小肠和下丘脑营养感受对肝脏葡萄糖输出的调节

国家自然科学基金

0+阅读 · 2014年12月31日

维生素B12输入体蛋白BtuCD的构象变化和转运机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于β/B2相自润滑的TiAl-RE合金优化设计与管材制备基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于变分收敛技术的平衡问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于空间感知的混合现实徒手自然交互技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

光老化皮肤CatG、MMPS对TGF-β/Smad通路的调控及交互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

OMEGA-3多不饱和脂肪酸在新生大鼠缺血缺氧性脑损伤中的神经保护作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

超精度视频内容三维重建

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation

Arxiv

0+阅读 · 2023年6月16日

On Data Sampling Strategies for Training Neural Network Speech Separation Models

Arxiv

0+阅读 · 2023年6月16日

EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

Arxiv

0+阅读 · 2023年6月16日

Scalable 3D Captioning with Pretrained Models

Arxiv

0+阅读 · 2023年6月16日

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Arxiv

0+阅读 · 2023年6月15日

Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Arxiv

0+阅读 · 2023年6月15日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Arxiv

14+阅读 · 2019年8月12日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

VIP会员

文章信息

相关主题

Transformer模型

行为数据集

相关VIP内容

百篇论文纵览大型语言模型最新研究进展

百篇论文纵览大型语言模型最新研究进展

专知会员服务

70+阅读 · 2023年3月31日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

[ICCV 2021] 联合视觉语义推理：文本识别的多级解码器

[ICCV 2021] 联合视觉语义推理：文本识别的多级解码器

专知会员服务

19+阅读 · 2021年11月28日

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

【ACL2020-亚马逊】Transformers多分辨率和多模态语音识别，Multiresolution and Multimodal Speech Recognition with Transformers

专知会员服务

15+阅读 · 2020年5月5日

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

【CVPR2020-Oral-牛津-Facebook】从单个图像进行端到端的视图合成，SynSin-View Synthesis

专知会员服务

29+阅读 · 2020年3月26日

八篇 ICCV 2019 【图神经网络（GNN）+CV】相关论文

八篇 ICCV 2019 【图神经网络（GNN）+CV】相关论文

专知会员服务

30+阅读 · 2020年1月10日

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

【论文推荐】小样本视频合成，Few-shot Video-to-Video Synthesis

专知会员服务

24+阅读 · 2019年12月15日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

49+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

热门VIP内容

开通专知VIP会员享更多权益服务

大语言模型中的检索与结构化增强生成综述

《实现多层防御多轮交战机制的扩展型随机齐射模型》2025年最新83页

【CMU博士论文】交互驱动的人体动作估计与生成

如何避免生成式人工智能在作战中失控失效

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

【论文推荐】最新四篇CVPR2018 视频描述生成相关论文—双向注意力、Transformer、重构网络、层次强化学习

专知

31+阅读 · 2018年6月4日

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

【论文推荐】最新八篇生成对抗网络相关论文—BRE、图像合成、多模态图像生成、非配对多域图、注意力、对抗特征增强、深度对抗性训练

专知

16+阅读 · 2018年5月14日

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

【泡泡一分钟】基于多视图卷积网络的草图三维重建技术(3dv-66)

泡泡机器人SLAM

11+阅读 · 2018年3月31日

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

【论文推荐】最新5篇图像描述生成（Image Caption）相关论文—情感、注意力机制、遥感图像、序列到序列、深度神经结构

专知

66+阅读 · 2018年1月31日

MoCoGAN 分解运动和内容的视频生成

MoCoGAN 分解运动和内容的视频生成

CreateAMind

18+阅读 · 2017年10月21日

相关论文

AvatarBooth: High-Quality and Customizable 3D Human Avatar Generation

Arxiv

0+阅读 · 2023年6月16日

On Data Sampling Strategies for Training Neural Network Speech Separation Models

Arxiv

0+阅读 · 2023年6月16日

EVOPOSE: A Recursive Transformer For 3D Human Pose Estimation With Kinematic Structure Priors

Arxiv

0+阅读 · 2023年6月16日

Scalable 3D Captioning with Pretrained Models

Arxiv

0+阅读 · 2023年6月16日

Diff-TTSG: Denoising probabilistic integrated speech and gesture synthesis

Arxiv

0+阅读 · 2023年6月15日

Emotional Speech-Driven Animation with Content-Emotion Disentanglement

Arxiv

0+阅读 · 2023年6月15日

From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression

Arxiv

10+阅读 · 2021年12月14日

Graph Signal Processing -- Part I: Graphs, Graph Spectra, and Spectral Clustering

Arxiv

14+阅读 · 2019年8月12日

3D Hand Shape and Pose Estimation from a Single RGB Image

3D Hand Shape and Pose Estimation from a Single RGB Image

Arxiv

17+阅读 · 2019年3月3日

End-to-End Dense Video Captioning with Masked Transformer

Arxiv

14+阅读 · 2018年4月3日

相关基金

袖状胃切除术后小肠和下丘脑营养感受对肝脏葡萄糖输出的调节

国家自然科学基金

0+阅读 · 2014年12月31日

维生素B12输入体蛋白BtuCD的构象变化和转运机理研究

国家自然科学基金

0+阅读 · 2014年12月31日

基于β/B2相自润滑的TiAl-RE合金优化设计与管材制备基础研究

国家自然科学基金

0+阅读 · 2013年12月31日

基于变分收敛技术的平衡问题研究

国家自然科学基金

1+阅读 · 2012年12月31日

基于空间感知的混合现实徒手自然交互技术研究

国家自然科学基金

1+阅读 · 2012年12月31日

光老化皮肤CatG、MMPS对TGF-β/Smad通路的调控及交互作用研究

国家自然科学基金

0+阅读 · 2012年12月31日

OMEGA-3多不饱和脂肪酸在新生大鼠缺血缺氧性脑损伤中的神经保护作用及机制研究

国家自然科学基金

0+阅读 · 2011年12月31日

超精度视频内容三维重建

国家自然科学基金

0+阅读 · 2011年12月31日

基于Decorin基因甲基化调控的非小细胞肺癌转移的分子机制

国家自然科学基金

0+阅读 · 2011年12月31日

基于list-mode数据的快速SART真3D PET断层重建算法的研究

国家自然科学基金

0+阅读 · 2011年12月31日

微信扫码咨询专知VIP会员