音素论文 - 专知

会员服务 ·

Building Robust and Scalable Multilingual ASR for Indian Languages

Arxiv

0+阅读 · 11月19日

Enhancing Quranic Learning: A Multimodal Deep Learning Approach for Arabic Phoneme Recognition

Arxiv

0+阅读 · 11月21日

VSpeechLM: A Visual Speech Language Model for Visual Text-to-Speech Task

Arxiv

0+阅读 · 11月27日

Why Isn't Relational Learning Taking Over the World?

Arxiv

0+阅读 · 11月5日

MEGConformer: Conformer-Based MEG Decoder for Robust Speech and Phoneme Classification

Arxiv

0+阅读 · 12月1日

Seeing isn't Hearing: Benchmarking Vision Language Models at Interpreting Spectrograms

Arxiv

0+阅读 · 11月17日

Why Isn't Relational Learning Taking Over the World?

Arxiv

0+阅读 · 10月30日

M-CIF: Multi-Scale Alignment For CIF-Based Non-Autoregressive ASR

Arxiv

0+阅读 · 10月25日

Are These Even Words? Quantifying the Gibberishness of Generative Speech Models

Arxiv

0+阅读 · 10月24日

PASE: Phoneme-Aware Speech Encoder to Improve Lip Sync Accuracy for Talking Head Synthesis

Arxiv

0+阅读 · 10月15日

I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2

I Have No Mouth, and I Must Rhyme: Uncovering Internal Phonetic Representations in LLaMA 3.2

Arxiv

0+阅读 · 10月15日

FAC-FACodec: Controllable Zero-Shot Foreign Accent Conversion with Factorized Speech Codec

Arxiv

0+阅读 · 10月12日

Dual Data Scaling for Robust Two-Stage User-Defined Keyword Spotting

Arxiv

0+阅读 · 10月12日

Phonikud: Hebrew Grapheme-to-Phoneme Conversion for Real-Time Text-to-Speech

Arxiv

0+阅读 · 10月10日

ControlAudio: Tackling Text-Guided, Timing-Indicated and Intelligible Audio Generation via Progressive Diffusion Modeling

Arxiv

0+阅读 · 10月10日

参考链接

微信扫码咨询专知VIP会员