*《Stabilizing Transformers for Reinforcement Learning》E Parisotto, H. F Song, J W. Rae, R Pascanu, C Gulcehre, S M. Jayakumar, M Jaderberg, R L Kaufman, A Clark, S Noury, M M. Botvinick, N Heess, R Hadsell [DeepMind] (2019)



强化学习 (Reinforcement learning) 是受到行为心理学启发,机器学习中研究个体 (agent) 如何在环境中采取行动以最大化奖赏 (reward) 的领域。 这一问题由于其普遍性,在许多领域中都有研究,例如博弈论,控制论,运筹学,信息论等等。





Deep Learning with Python introduces the field of deep learning using the Python language and the powerful Keras library. Written by Keras creator and Google AI researcher François Chollet, this book builds your understanding through intuitive explanations and practical examples.


Generalization, i.e., the ability to adapt to novel scenarios, is the hallmark of human intelligence. While we have systems that excel at recognizing objects, cleaning floors, playing complex games and occasionally beating humans, they are incredibly specific in that they only perform the tasks they are trained for and are miserable at generalization. Could optimizing towards fixed external goals be hindering the generalization instead of aiding it? In this thesis, we present our initial efforts toward endowing artificial agents with a human-like ability to generalize in diverse scenarios. The main insight is to first allow the agent to learn general-purpose skills in a completely self-supervised manner, without optimizing for any external goal.

To be able to learn on its own, the claim is that an artificial agent must be embodied in the world, develop an understanding of its sensory input (e.g., image stream) and simultaneously learn to map this understanding to its motor outputs (e.g., torques) in an unsupervised manner. All these considerations lead to two fundamental questions: how to learn rich representations of the world similar to what humans learn?; and how to re-use such a representation of past knowledge to incrementally adapt and learn more about the world similar to how humans do? We believe prediction is the key to this answer. We propose generic mechanisms that employ prediction as a supervisory signal in allowing the agents to learn sensory representations as well as motor control. These two abilities equip an embodied agent with a basic set of general-purpose skills which are then later repurposed to perform complex tasks.

We discuss how this framework can be instantiated to develop curiosity-driven agents (virtual as well as real) that can learn to play games, learn to walk, and learn to perform real-world object manipulation without any rewards or supervision. These self-supervised robotic agents, after exploring the environment, can generalize to find their way in office environments, tie knots using rope, rearrange object configuration, and compose their skills in a modular fashion.

Learning to Generalize via Self-Supervised Prediction.pdf