基于大语言模型的心理测量量表开发：代表性人口数据模拟的可行性研究 (In Silico Development of Psychometric Scales: Feasibility of Representative Population Data Simulation with LLMs)

Developing and validating psychometric scales requires large samples, multiple testing phases, and substantial resources. Recent advances in Large Language Models (LLMs) enable the generation of synthetic participant data by prompting models to answer items while impersonating individuals of specific demographic profiles, potentially allowing in silico piloting before real data collection. Across four preregistered studies (N = circa 300 each), we tested whether LLM-simulated datasets can reproduce the latent structures and measurement properties of human responses. In Studies 1-2, we compared LLM-generated data with real datasets for two validated scales; in Studies 3-4, we created new scales using EFA on simulated data and then examined whether these structures generalized to newly collected human samples. Simulated datasets replicated the intended factor structures in three of four studies and showed consistent configural and metric invariance, with scalar invariance achieved for the two newly developed scales. However, correlation-based tests revealed substantial differences between real and synthetic datasets, and notable discrepancies appeared in score distributions and variances. Thus, while LLMs capture group-level latent structures, they do not approximate individual-level data properties. Simulated datasets also showed full internal invariance across gender. Overall, LLM-generated data appear useful for early-stage, group-level psychometric prototyping, but not as substitutes for individual-level validation. We discuss methodological limitations, risks of bias and data pollution, and ethical considerations related to in silico psychometric simulations.

翻译：心理测量量表的开发与验证需要大规模样本、多阶段测试及大量资源。大型语言模型（LLMs）的最新进展使得通过提示模型模拟特定人口特征个体回答量表条目来生成合成参与者数据成为可能，这为真实数据收集前的计算机模拟预研提供了潜在途径。通过四项预先注册的研究（每项研究样本量N≈300），我们检验了LLM模拟数据集能否复现人类反应的潜在结构与测量特性。在研究1-2中，我们针对两个已验证量表比较了LLM生成数据与真实数据集；在研究3-4中，我们利用模拟数据进行探索性因子分析（EFA）创建新量表，随后检验这些结构能否推广至新收集的人类样本。模拟数据集在四项研究中的三项成功复现了预设因子结构，并表现出稳定的构型与度量等值性，两个新开发量表还实现了标量等值性。然而，基于相关性的检验揭示了真实数据集与合成数据集间的显著差异，在分数分布与方差方面也出现明显不一致。因此，尽管LLMs能够捕捉群体层面的潜在结构，但无法近似个体层面的数据特性。模拟数据集还显示出跨性别的完全内部等值性。总体而言，LLM生成数据在群体层面的心理测量原型开发早期阶段具有应用价值，但不能替代个体层面的验证。我们讨论了计算机心理测量模拟的方法学局限、偏见与数据污染风险，以及相关的伦理考量。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

RAG与RAU：自然语言处理中的检索增强语言模型综述

专知会员服务

87+阅读 · 2024年5月3日

【清华大学】Delta调优:预训练语言模型参数有效方法的综合研究，Delta Tuning: A Comprehensive Study of Parameter Efficient Methods for Pre-trained Language Models

专知会员服务

26+阅读 · 2022年3月15日