利用深强化学习,在多个实际经济模拟中找到一般平衡 (Finding General Equilibria in Many-Agent Economic Simulations Using Deep Reinforcement Learning)

Real economies can be seen as a sequential imperfect-information game with many heterogeneous, interacting strategic agents of various agent types, such as consumers, firms, and governments. Dynamic general equilibrium models are common economic tools to model the economic activity, interactions, and outcomes in such systems. However, existing analytical and computational methods struggle to find explicit equilibria when all agents are strategic and interact, while joint learning is unstable and challenging. Amongst others, a key reason is that the actions of one economic agent may change the reward function of another agent, e.g., a consumer's expendable income changes when firms change prices or governments change taxes. We show that multi-agent deep reinforcement learning (RL) can discover stable solutions that are epsilon-Nash equilibria for a meta-game over agent types, in economic simulations with many agents, through the use of structured learning curricula and efficient GPU-only simulation and training. Conceptually, our approach is more flexible and does not need unrealistic assumptions, e.g., market clearing, that are commonly used for analytical tractability. Our GPU implementation enables training and analyzing economies with a large number of agents within reasonable time frames, e.g., training completes within a day. We demonstrate our approach in real-business-cycle models, a representative family of DGE models, with 100 worker-consumers, 10 firms, and a government who taxes and redistributes. We validate the learned meta-game epsilon-Nash equilibria through approximate best-response analyses, show that RL policies align with economic intuitions, and that our approach is constructive, e.g., by explicitly learning a spectrum of meta-game epsilon-Nash equilibria in open RBC models.

翻译：现实经济可以被视为一种由多种不同因素组成的连续不完善的信息游戏,它们与消费者、公司和政府等各种代理商类型的战略代理人相互作用。动态一般均衡模型是模拟经济活动、互动和这些系统中的结果的共同经济工具。然而,现有的分析和计算方法在所有代理商具有战略性和互动性时很难找到明确的平衡,而联合学习则不稳定和具有挑战性。除其他原因外,一个经济代理商的行动可能改变另一个代理商的奖励功能,例如当公司改变价格或政府改变税收时消费者的消耗性收入变化。我们表明,多代理商深度强化学习(RL)能够找到稳定的解决办法,这些解决办法是针对各种代理商类型的全局性交易,通过结构化学习课程和高效的只使用GPU的模拟和培训,在概念上,我们的方法更灵活,不需要不切实际的假设,例如,市场清算,这种假设通常用于分析性调整。我们的GPU实施可以对电子交易进行深度强化学习分析,在100个代理商类型上对经济体进行在线分析和分析,在10个成本模型中展示我们公司的实际学习模式。