Existing multilingual speech NLP works focus on a relatively small subset of languages, and thus current linguistic understanding of languages predominantly stems from classical approaches. In this work, we propose a method to analyze language similarity using deep learning. Namely, we train a model on the Wilderness dataset and investigate how its latent space compares with classical language family findings. Our approach provides a new direction for cross-lingual data augmentation in any speech-based NLP task.
翻译:现有的多语种语言国家语言方案工作的重点是相对较少的一组语言,因此,目前语言对语言的理解主要来自古典方法。 在这项工作中,我们提出一种方法,利用深层学习分析语言相似性。也就是说,我们培训了野性数据集模型,并调查其潜在空间如何与传统语言家庭调查结果相比。 我们的方法为任何基于语言的国家语言方案任务中的跨语言数据增加提供了新的方向。