VIP内容

【导读】第28届ACM国际多媒体会议(ACM MM)于2020年10月12日至16日在线举行。刚刚,包括最佳论文,最佳学生论文,最佳demo, 最佳开源软件在内的所有多媒体领域大奖都已出炉。

ACM国际多媒体会议(ACM International Conference on Multimedia, 简称自1993年首次召开以来,ACMMM每年召开一次,已经成为多媒体领域顶级会议,也是中国计算机学会推荐的A类国际学术。会议热门方向有大规模图像视频分析、社会媒体研究、多模态人机交互、计算视觉、计算图像等等。

最佳论文

标题:PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music (学习考虑音高、节奏和动态的符号音乐嵌入)

作者:Hongru Liang, Wenqiang Lei, Paul Yaozhu Chan, Zhenglu Yang, Maosong Sun, Tat-Seng Chua

摘要:目前,确定性嵌入仍然是计算音乐学中符号音乐深度学习的基本挑战之一。与自然语言类似,音乐可以被建模为token序列,这促使大多数现有的解决方案探索利用文本嵌入模型来构建音乐嵌入。然而,音乐与自然语言有两个关键的区别:(1)音乐token是多面性的,它包含了音高、节奏和动态信息;(2)音乐上下文是二维的——每个音乐token都依赖于旋律上下文和和声上下文。在这项工作中,我们提供了一个全面的解决方案,方案包含一个名为PiRhDy的新框架,它无缝地集成了音高、节奏和动态信息。PiRhDy采用一种层次化的策略,它可分解为两个步骤: (1) token(即音符事件)建模,分开表示音高、节奏和动态,并将它们集成为单个token; (2)上下文建模,利用旋律和和声知识训练token嵌入。我们对PiRhDy的各组成部分和子策略进行了深入研究,并在三个下游任务中进一步验证了嵌入的效果——旋律完成、伴奏建议和类型分类。研究结果表明PiRhDy是符号音乐神经方法的重要进展,也展现出PiRhDy作为广泛的符号音乐应用预训练模型的潜力。

论文地址: https://dl.acm.org/doi/abs/10.1145/3394171.3414032

最佳学生论文

标题:Learning from the Past: Meta-Continual Learning with Knowledge Embedding for Jointly Sketch, Cartoon, and Caricature Face Recognition(从过去学习: 面向素描、卡通和漫画人脸联合识别的知识嵌入元持续学习)

作者: Wenbo Zheng, Lan Yan, Feiyue Wang, Chao Gou

摘要:本文面向一个从不同模态学习的挑战性任务,解决了针对抽象素描、卡通、漫画和真实照片的人脸联合识别问题。由于抽象人脸的显著差异,建立视觉模型来识别来自这些模式的数据是一项极具挑战性的工作。我们提出了一个新的框架,称为知识嵌入元持续学习,以解决素描,卡通和漫画的人脸联合识别任务。特别地,我们首先提出了一个深度关系网络来捕获和记忆不同样本之间的关系。其次,我们展示了知识图的构建,它将图像和标签联系起来,作为元学习者的指导。然后,我们设计了一个知识嵌入机制,以纳入知识表示到我们的网络。最后,为了减轻灾难性遗忘,我们使用元连续模型,更新我们的集成模型,提高其预测精度。使用这种元连续模型,我们的网络可以从过去学习。最后的分类是我们的网络通过学习比较样本的特征而获得的。实验结果表明,与其他先进的方法相比,我们的方法获得了更高的性能。

论文地址:

https://dl.acm.org/doi/abs/10.1145/3394171.3413892

成为VIP会员查看完整内容
0
20

最新内容

Video relation detection problem refers to the detection of the relationship between different objects in videos, such as spatial relationship and action relationship. In this paper, we present video relation detection with trajectory-aware multi-modal features to solve this task. Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction. We use the state-of-the-art object detection method to ensure the accuracy of object trajectory detection and multi-modal feature representation to help the prediction of relation between objects. Our method won the first place on the video relation detection task of Video Relation Understanding Grand Challenge in ACM Multimedia 2020 with 11.74\% mAP, which surpasses other methods by a large margin.

0
0
下载
预览

最新论文

Video relation detection problem refers to the detection of the relationship between different objects in videos, such as spatial relationship and action relationship. In this paper, we present video relation detection with trajectory-aware multi-modal features to solve this task. Considering the complexity of doing visual relation detection in videos, we decompose this task into three sub-tasks: object detection, trajectory proposal and relation prediction. We use the state-of-the-art object detection method to ensure the accuracy of object trajectory detection and multi-modal feature representation to help the prediction of relation between objects. Our method won the first place on the video relation detection task of Video Relation Understanding Grand Challenge in ACM Multimedia 2020 with 11.74\% mAP, which surpasses other methods by a large margin.

0
0
下载
预览
Top