Large language models (LLMs) exhibit cultural bias from overrepresented viewpoints in training data, yet cultural alignment remains a challenge due to limited cultural knowledge and a lack of exploration into effective learning approaches. We introduce a cost-efficient and cognitively grounded method: fine-tuning LLMs on native speakers' word-association norms, leveraging cognitive psychology findings that such associations capture cultural knowledge. Using word association datasets from native speakers in the US (English) and China (Mandarin), we train Llama-3.1-8B and Qwen-2.5-7B via supervised fine-tuning and preference optimization. We evaluate models' cultural alignment through a two-tier evaluation framework that spans lexical associations and cultural value alignment using the World Values Survey. Results show significant improvements in lexical alignment (16-20% English, 43-165% Mandarin on Precision@5) and high-level cultural value shifts. On a subset of 50 questions where US and Chinese respondents diverge most, fine-tuned Qwen nearly doubles its response alignment with Chinese values (13 to 25). Remarkably, our trained 7-8B models match or exceed vanilla 70B baselines, demonstrating that a few million of culture-grounded associations achieve value alignment without expensive retraining. Our work highlights both the promise and the need for future research grounded in human cognition in improving cultural alignment in AI models.
翻译:大语言模型(LLMs)因训练数据中过度代表特定观点而表现出文化偏见,然而由于文化知识有限且缺乏有效学习方法的探索,文化对齐仍面临挑战。我们提出一种成本高效且认知基础的方法:基于母语者词汇联想规范对LLMs进行微调,利用认知心理学研究发现——此类联想能够捕捉文化知识。通过使用来自美国(英语)和中国(普通话)母语者的词汇联想数据集,我们通过监督微调和偏好优化对Llama-3.1-8B和Qwen-2.5-7B进行训练。我们采用涵盖词汇联想与文化价值对齐的双层评估框架,结合世界价值观调查(World Values Survey)评估模型的文化对齐程度。结果显示,模型在词汇对齐(英语Precision@5提升16-20%,普通话提升43-165%)和高层文化价值偏移方面均有显著改进。在美中受访者分歧最大的50个问题子集上,微调后的Qwen模型与中国价值观的响应对齐度接近翻倍(从13提升至25)。值得注意的是,我们训练的7-8B模型达到或超过了原始70B基线模型的性能,这表明仅需数百万条基于文化的关联即可实现价值对齐,无需昂贵的重新训练。本研究凸显了基于人类认知的未来研究在提升AI模型文化对齐方面的潜力与必要性。