In this survey, we systematically analyze techniques used to adapt large multimodal models (LMMs) for low-resource (LR) languages, examining approaches ranging from visual enhancement and data creation to cross-modal transfer and fusion strategies. Through a comprehensive analysis of 117 studies across 96 LR languages, we identify key patterns in how researchers tackle the challenges of limited data and computational resources. We categorize works into resource-oriented and method-oriented contributions, further dividing contributions into relevant sub-categories. We compare method-oriented contributions in terms of performance and efficiency, discussing benefits and limitations of representative studies. We find that visual information often serves as a crucial bridge for improving model performance in LR settings, though significant challenges remain in areas such as hallucination mitigation and computational efficiency. In summary, we provide researchers with a clear understanding of current approaches and remaining challenges in making LMMs more accessible to speakers of LR (understudied) languages. We complement our survey with an open-source repository available at: https://github.com/marianlupascu/LMM4LRL-Survey.
翻译:本综述系统性地分析了将大规模多模态模型(LMMs)适配于低资源(LR)语言所采用的技术,审视了从视觉增强与数据创建到跨模态迁移与融合策略等一系列方法。通过对涵盖96种低资源语言的117项研究进行全面分析,我们识别了研究人员应对有限数据和计算资源挑战的关键模式。我们将相关工作归类为资源导向型与方法导向型贡献,并进一步将贡献细分为相关子类别。我们在性能与效率方面比较了方法导向型贡献,并讨论了代表性研究的优势与局限。我们发现,在低资源场景下,视觉信息常作为提升模型性能的关键桥梁,但在幻觉缓解和计算效率等领域仍存在重大挑战。总而言之,我们为研究人员提供了关于当前方法以及使LMMs更易于低资源(研究不足)语言使用者所面临剩余挑战的清晰理解。我们通过一个开源仓库(https://github.com/marianlupascu/LMM4LRL-Survey)对本综述进行了补充。