基于倒置强化学习的小型语言模型扩展多模态搜索与推荐系统 (Scaling Multimodal Search and Recommendation with Small Language Models via Upside-Down Reinforcement Learning)

In this work, we investigate how small language models (SLMs) can be scaled to support multimodal search and recommendation use cases while remaining efficient enough for real-time, resource-constrained deployments. We present a framework that combines upside-down reinforcement learning with synthetic data distillation from a large language model (Llama-3) to train a 100M-parameter GPT-2 model for multitask prompt generation. Despite being up to 80 times smaller than state-of-the-art large language models (LLMs), our SLM achieves relevance and diversity scores within 6% of competitive baselines such as Llama-3 8B, Qwen3 8B, and Ministral 8B. These results demonstrate that SLMs can effectively handle multimodal search and recommendation tasks, while dramatically reducing inference latency and memory overhead. Our study highlights the potential of lightweight models as practical engines for scalable multimodal discovery, bridging the gap between cutting-edge research and real-world multimodal applications such as media recommendations and creative content generation.

翻译：本研究探讨了如何将小型语言模型（SLMs）扩展至支持多模态搜索与推荐应用场景，同时保持其在实时、资源受限部署环境中的高效性。我们提出了一种框架，该框架结合了倒置强化学习与来自大型语言模型（Llama-3）的合成数据蒸馏技术，用于训练一个拥有1亿参数的GPT-2模型以执行多任务提示生成。尽管与最先进的大型语言模型（LLMs）相比，我们的SLM模型规模缩小了高达80倍，但其在相关性与多样性指标上仍能达到与Llama-3 8B、Qwen3 8B和Ministral 8B等竞争基线模型相差6%以内的性能。这些结果表明，SLMs能够有效处理多模态搜索与推荐任务，同时显著降低推理延迟和内存开销。本研究凸显了轻量级模型作为可扩展多模态发现实用引擎的潜力，弥合了前沿研究与现实世界多模态应用（如媒体推荐和创意内容生成）之间的鸿沟。