Traditional efforts to measure historical structural oppression struggle with cross-national validity due to the unique, locally specified histories of exclusion, colonization, and social status in each country, and often have relied on structured indices that privilege material resources while overlooking lived, identity-based exclusion. We introduce a novel framework for oppression measurement that leverages Large Language Models (LLMs) to generate context-sensitive scores of lived historical disadvantage across diverse geopolitical settings. Using unstructured self-identified ethnicity utterances from a multilingual COVID-19 global study, we design rule-guided prompting strategies that encourage models to produce interpretable, theoretically grounded estimations of oppression. We systematically evaluate these strategies across multiple state-of-the-art LLMs. Our results demonstrate that LLMs, when guided by explicit rules, can capture nuanced forms of identity-based historical oppression within nations. This approach provides a complementary measurement tool that highlights dimensions of systemic exclusion, offering a scalable, cross-cultural lens for understanding how oppression manifests in data-driven research and public health contexts. To support reproducible evaluation, we release an open-sourced benchmark dataset for assessing LLMs on oppression measurement (https://github.com/chattergpt/HSO-Bench).
翻译:传统衡量历史结构性压迫的方法因各国独特的、本地化的排斥、殖民和社会地位历史而面临跨国效度挑战,且常依赖结构化指数,这些指数偏重物质资源而忽视基于身份的实际生活排斥。我们提出一种新颖的压迫衡量框架,利用大型语言模型(LLMs)生成不同地缘政治背景下对历史生活劣势的语境敏感评分。基于一项多语言COVID-19全球研究中的非结构化自我认同民族表述,我们设计了规则引导的提示策略,促使模型产生可解释、理论依据充分的压迫估计。我们系统评估了这些策略在多种先进LLMs上的表现。结果表明,在明确规则引导下,LLMs能够捕捉国家内部基于身份的细微历史压迫形式。该方法提供了一种补充性测量工具,突显系统性排斥的维度,为理解数据驱动研究和公共卫生背景下压迫的表现形式提供了可扩展的跨文化视角。为支持可重复评估,我们发布了用于评估LLMs在压迫衡量任务上的开源基准数据集(https://github.com/chattergpt/HSO-Bench)。