Mental rotation -- the ability to compare objects seen from different viewpoints -- is a fundamental example of mental simulation and spatial world modelling in humans. Here we propose a mechanistic model of human mental rotation, leveraging advances in deep, equivariant, and neuro-symbolic learning. Our model consists of three stacked components: (1) an equivariant neural encoder, taking images as input and producing 3D spatial representations of objects, (2) a neuro-symbolic object encoder, deriving symbolic descriptions of objects from these spatial representations, and (3) a neural decision agent, comparing these symbolic descriptions to prescribe rotation simulations in 3D latent space via a recurrent pathway. Our model design is guided by the abundant experimental literature on mental rotation, which we complemented with experiments in VR where participants could at times manipulate the objects to compare, providing us with additional insights into the cognitive process of mental rotation. Our model captures well the performance, response times and behavior of participants in our and others' experiments. The necessity of each model component is shown through systematic ablations. Our work adds to a recent collection of deep neural models of human spatial reasoning, further demonstrating the potency of integrating deep, equivariant, and symbolic representations to model the human mind.
翻译:心智旋转——即从不同视角比较物体的能力——是人类心智模拟与空间世界建模的基本范例。本文提出一种基于深度、等变与神经符号学习进展的人类心智旋转机制模型。该模型包含三个堆叠组件:(1) 等变神经编码器,以图像为输入并生成物体的三维空间表征;(2) 神经符号物体编码器,从这些空间表征中推导物体的符号化描述;(3) 神经决策代理,通过循环通路在三维潜在空间中比较这些符号化描述以规划旋转模拟。我们的模型设计参考了大量心智旋转实验文献,并辅以VR实验(参与者可操纵待比较物体),从而为理解心智旋转认知过程提供了新视角。该模型准确复现了我们及他人实验中参与者的表现、反应时间与行为特征。通过系统性消融实验验证了各模型组件的必要性。本研究丰富了近期人类空间推理的深度神经网络模型系列,进一步证明了融合深度、等变与符号表征在模拟人类心智方面的有效性。