Nature provides a way to understand physics with reinforcement learning since nature favors the economical way for an object to propagate. In the case of classical mechanics, nature favors the object to move along the path according to the integral of the Lagrangian, called the action $\mathcal{S}$. We consider setting the reward/penalty as a function of $\mathcal{S}$, so the agent could learn the physical trajectory of particles in various kinds of environments with reinforcement learning. In this work, we verified the idea by using a Q-Learning based algorithm on learning how light propagates in materials with different refraction indices, and show that the agent could recover the minimal-time path equivalent to the solution obtained by Snell's law or Fermat's Principle. We also discuss the similarity of our reinforcement learning approach to the path integral formalism.
翻译:自然提供了一种理解物理学的方法, 强化学习, 因为自然有利于以经济的方式传播物体。 在古典机械学中, 自然界倾向于按照Lagrangian的有机体, 称为“ $\ mathcal{S}$ 行动 ” 来沿着这条路前进。 我们考虑将奖赏/ 刑罚设定为$\ mathcal{S}$ 的函数, 这样代理商可以学习各种环境中粒子的物理轨迹 。 在这项工作中, 我们通过使用基于学习的Q- Learch算法来验证这个想法, 以学习光如何在材料中以不同折射指数传播为根据, 并表明代理商可以恢复与Snell 法律或 Fermat 原则的解决方案相当的最短时间路径。 我们还讨论了我们强化学习方法与路径整体形式主义相似之处 。