Within Reinforcement Learning, there is a growing collection of research which aims to express all of an agent's knowledge of the world through predictions about sensation, behaviour, and time. This work can be seen not only as a collection of architectural proposals, but also as the beginnings of a theory of machine knowledge in reinforcement learning. Recent work has expanded what can be expressed using predictions, and developed applications which use predictions to inform decision-making on a variety of synthetic and real-world problems. While promising, we here suggest that the notion of predictions as knowledge in reinforcement learning is as yet underdeveloped: some work explicitly refers to predictions as knowledge, what the requirements are for considering a prediction to be knowledge have yet to be well explored. This specification of the necessary and sufficient conditions of knowledge is important; even if claims about the nature of knowledge are left implicit in technical proposals, the underlying assumptions of such claims have consequences for the systems we design. These consequences manifest in both the way we choose to structure predictive knowledge architectures, and how we evaluate them. In this paper, we take a first step to formalizing predictive knowledge by discussing the relationship of predictive knowledge learning methods to existing theories of knowledge in epistemology. Specifically, we explore the relationships between Generalized Value Functions and epistemic notions of Justification and Truth.
翻译:在加强学习内部,正在收集越来越多的研究,目的是通过对感知、行为和时间的预测来表达一个代理人对世界的所有知识。这项工作不仅可以视为建筑建议汇编,而且可以视为加强学习的机器知识理论的开端。最近的工作扩大了利用预测可以表达的内容,并开发了应用方法,利用预测来为各种合成和现实世界问题的决策提供信息。虽然我们很有希望,但我们在这里建议,预测作为强化学习知识的概念还很不成熟:有些工作明确提到预测作为知识,而考虑预测成为知识的要求是什么,这些工作还有待很好地探讨。关于必要和充分知识条件的具体规定很重要;即使技术建议中隐含了对知识性质的主张,这种主张的基本假设对我们设计的系统产生了后果。这些后果表现在我们选择构建预测知识结构的方式中,以及我们如何评价这些后果。在本文中,我们迈出了第一步,通过讨论预测性知识学习方法与现有常识化的理论之间的关系,具体地探讨我们现有常识化的理论和常识化理论之间的关系,从而正式确定预测性知识的知识。