Polychoric correlation is often an important building block in the analysis of rating data, particularly for structural equation models. However, the commonly employed maximum likelihood (ML) estimator is highly susceptible to misspecification of the polychoric correlation model, for instance through violations of latent normality assumptions. We propose a novel estimator that is designed to be robust against partial misspecification of the polychoric model, that is, when the model is misspecified for an unknown fraction of observations, such as careless respondents. To this end, the estimator minimizes a robust loss function based on the divergence between observed frequencies and theoretical frequencies implied by the polychoric model. In contrast to existing literature, our estimator makes no assumption on the type or degree of model misspecification. It furthermore generalizes ML estimation, is consistent as well as asymptotically normally distributed, and comes at no additional computational cost. We demonstrate the robustness and practical usefulness of our estimator in simulation studies and an empirical application on a Big Five administration. In the latter, the polychoric correlation estimates of our estimator and ML differ substantially, which, after further inspection, is likely due to the presence of careless respondents that the estimator helps identify.
翻译:多分格相关系数在评级数据分析中常作为重要基础构件,尤其适用于结构方程模型。然而,常用的最大似然估计量对多分格相关模型的设定误差极为敏感,例如违反潜在正态性假设的情况。本文提出一种新颖的估计量,旨在对多分格模型的部分设定误差具有稳健性,即当模型对未知比例的观测值(如粗心应答者)存在设定误差时仍能保持稳定。该估计量通过最小化基于观测频率与多分格模型理论频率间差异的稳健损失函数来实现目标。与现有文献不同,本估计量无需对模型设定误差的类型或程度作出假设。它不仅推广了最大似然估计,具有一致性和渐近正态分布特性,且未增加额外计算成本。我们通过模拟研究和一项大五人格测评的实证应用,证明了该估计量的稳健性与实用价值。在实证应用中,本估计量与最大似然估计所得的多分格相关系数存在显著差异,经深入分析发现,这种差异很可能源于粗心应答者的存在,而本估计量恰好有助于识别此类样本。