更智能而非更大：用于汽车硬件在环测试的微调RAG增强型大语言模型 (Smarter, not Bigger: Fine-Tuned RAG-Enhanced LLMs for Automotive HIL Testing)

Hardware-in-the-Loop (HIL) testing is essential for automotive validation but suffers from fragmented and underutilized test artifacts. This paper presents HIL-GPT, a retrieval-augmented generation (RAG) system integrating domain-adapted large language models (LLMs) with semantic retrieval. HIL-GPT leverages embedding fine-tuning using a domain-specific dataset constructed via heuristic mining and LLM-assisted synthesis, combined with vector indexing for scalable, traceable test case and requirement retrieval. Experiments show that fine-tuned compact models, such as \texttt{bge-base-en-v1.5}, achieve a superior trade-off between accuracy, latency, and cost compared to larger models, challenging the notion that bigger is always better. An A/B user study further confirms that RAG-enhanced assistants improve perceived helpfulness, truthfulness, and satisfaction over general-purpose LLMs. These findings provide insights for deploying efficient, domain-aligned LLM-based assistants in industrial HIL environments.

翻译：硬件在环（HIL）测试对于汽车验证至关重要，但存在测试工件碎片化且利用率低的问题。本文提出HIL-GPT，一种检索增强生成（RAG）系统，将领域适配的大语言模型（LLMs）与语义检索相结合。HIL-GPT利用通过启发式挖掘和LLM辅助合成构建的领域特定数据集进行嵌入微调，并结合向量索引实现可扩展、可追溯的测试用例与需求检索。实验表明，微调后的紧凑模型（如\\texttt{bge-base-en-v1.5}）在准确性、延迟和成本之间实现了优于大型模型的平衡，挑战了“越大越好”的传统观念。一项A/B用户研究进一步证实，与通用LLMs相比，RAG增强型助手在感知帮助性、真实性和满意度方面均有提升。这些发现为在工业HIL环境中部署高效、领域对齐的基于LLM的助手提供了见解。