Understanding human personality is crucial for web applications such as personalized recommendation and mental health assessment. Existing studies on personality detection predominantly adopt a "posts -> user vector -> labels" modeling paradigm, which encodes social media posts into user representations for predicting personality labels (e.g., MBTI labels). While recent advances in large language models (LLMs) have improved text encoding capacities, these approaches remain constrained by limited supervision signals due to label scarcity, and under-specified semantic mappings between user language and abstract psychological constructs. We address these challenges by proposing ROME, a novel framework that explicitly injects psychological knowledge into personality detection. Inspired by standardized self-assessment tests, ROME leverages LLMs' role-play capability to simulate user responses to validated psychometric questionnaires. These generated question-level answers transform free-form user posts into interpretable, questionnaire-grounded evidence linking linguistic cues to personality labels, thereby providing rich intermediate supervision to mitigate label scarcity while offering a semantic reasoning chain that guides and simplifies the text-to-personality mapping learning. A question-conditioned Mixture-of-Experts module then jointly routes over post and question representations, learning to answer questionnaire items under explicit supervision. The predicted answers are summarized into an interpretable answer vector and fused with the user representation for final prediction within a multi-task learning framework, where question answering serves as a powerful auxiliary task for personality detection. Extensive experiments on two real-world datasets demonstrate that ROME consistently outperforms state-of-the-art baselines, achieving improvements (15.41% on Kaggle dataset).
翻译:理解人类人格对于个性化推荐和心理健康评估等网络应用至关重要。现有的人格检测研究主要采用“帖子 -> 用户向量 -> 标签”的建模范式,将社交媒体帖子编码为用户表示以预测人格标签(如MBTI标签)。尽管大语言模型(LLMs)的最新进展提升了文本编码能力,但这些方法仍受限于标签稀缺导致的监督信号不足,以及用户语言与抽象心理构念之间语义映射的欠明确性。为解决这些挑战,我们提出了ROME,一个将心理学知识显式注入人格检测的新颖框架。受标准化自评测试启发,ROME利用LLMs的角色扮演能力,模拟用户对已验证心理测量问卷的回答。这些生成的问题级答案将自由形式的用户帖子转化为可解释的、基于问卷的证据,将语言线索与人格标签关联起来,从而提供丰富的中间监督以缓解标签稀缺,同时提供语义推理链来引导并简化文本到人格映射的学习。随后,一个问题条件化的专家混合模块联合路由帖子与问题表示,在显式监督下学习回答问卷条目。预测的答案被汇总为可解释的答案向量,并与用户表示在多任务学习框架内融合以进行最终预测,其中问题回答作为人格检测的强大辅助任务。在两个真实世界数据集上的大量实验表明,ROME始终优于最先进的基线方法,实现了显著提升(在Kaggle数据集上提升15.41%)。