State Representation Learning for Control: An Overview

2019 年 1 月 10 日 CreateAMind

论文缺少的是最新的actionable 3d conv：但内容很多

As an example, infants expect inertial objects to follow principles of persistence, continuity, cohesion and solidity before appearance-based elements such as color, texture and perceptual goodness. At the same time, these principles help guide later learnings such as object’ rigidity, softness and liquids properties.

Later, adults will reconstruct perceptual scenes using internal representations of the objects and their physically relevant properties (mass, elas- ticity, friction, gravity, collision, etc.) [Lake et al., 2016].

2.2

Using prior knowledge to constrain the state space: A last approach is to handle SRL by using specific constraints or prior knowledge about the functioning, dynamics or physics of the world (besides the constraints of forward and inverse models) such as the temporal continuity or the causality principles that generally reflect the interaction of the agent with objects or in the environment [Jonschkowski and Brock, 2015]. Priors are defined as objective or loss functions L, applied on a set of states s1:n (Fig. 5), to be minimized (or maximized) under specific condition c. An example of condition can be enforcing locality or time proximity within the set of states.

Loss = Lprior(s1:n;θφ|c) (5) All these approaches are detailed in Section 3.

2.3 State representation characteristics

Besides the general idea that the state representation has the role of encoding essential information (for a given task) while discarding irrelevant aspects of the original data, let us detail what the characteristics of a good state representation are.

In a reinforcement learning framework, the authors of [Böhmer et al., 2015] defines a good state representation as a representation that is:

• Markovian, i.e. it summarizes all the necessary information to be able to choose an action within the policy, by looking only at the current state.

• Able to represent the true value of the current state well enough for policy improvement.

• Able to generalize the learned value-function to unseen states with similar futures.

• Low dimensional for efficient estimation.

3 Learning objectives

In this section, we review what objectives can be used to learn a relevant state representation. A schema detailing the core elements involved in each model’s loss function was introduced in Fig. 2 – 5, which highlights the main approaches to be described here. This section touches upon machine learning tools used in SRL such as auto-encoders or siamese networks. A more detailed description of these is later addressed in Section 4.

3.1 Reconstructing the observation

3.2 Learning a forward model

3.3 Learning an inverse model

3.4 Using feature adversarial learning

3.5 Exploiting rewards

3.6 Other objective functions

Slowness Principle

Variability

Proportionality

Repeatability

Dynamic verification

Selectivity

3.7 Using hybrid objectives

4 Building blocks of State Representation Learning

In this section, we cover various implementation aspects relevant to state representation learning and its evaluation. We refer to specific surrogate models, loss function specification tools or strategies that help constraining the information bottleneck and generalizing when learning low- dimensional state representations,

4.1 Learning tools

We first detail a set of models that through an auxiliary objective function, help learning a state representation. One or several of these learning tools can be integrated in broader SRL approaches as was previously described.

4.1.1 Auto-encoders (AE)4.1.2 Denoising auto-encoders (DAE)

4.1.3 Variational auto-encoders (VAE)

4.1.4 Siamese networks

登录查看更多

相关内容

Continuity

关注 4

让 iOS 8 和 OS X Yosemite 无缝切换的一个新特性。 > Apple products have always been designed to work together beautifully. But now they may really surprise you. With iOS 8 and OS X Yosemite, you’ll be able to do more wonderful things than ever before.

Source: Apple - iOS 8

机器学习隐私综述论文，An Overview of Privacy in Machine Learning

专知会员服务

81+阅读 · 2020年5月20日

强化学习的对比无监督表示，CURL: Contrastive Unsupervised Representations for Reinforcement Learning

专知会员服务

41+阅读 · 2020年4月11日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

166+阅读 · 2020年3月18日

【表示学习(Representation Learning)】8篇 NeurIPS 2019论文选读

专知会员服务

54+阅读 · 2019年12月22日