Recent advances in computational perception have significantly improved the ability of autonomous robots to perform state estimation with low entropy. Such advances motivate a reconsideration of robot decision-making under uncertainty. Current approaches to solving sequential decision-making problems model states as inhabiting the extremes of the perceptual entropy spectrum. As such, these methods are either incapable of overcoming perceptual errors or asymptotically inefficient in solving problems with low perceptual entropy. With low entropy perception in mind, we aim to explore a happier medium that balances computational efficiency with the forms of uncertainty we now observe from modern robot perception. We propose FastDownward Replanner (FD-Replan) as an efficient task planning method for goal-directed robot reasoning. FD-Replan combines belief space representation with the fast, goal-directed features of classical planning to efficiently plan for low entropy goal-directed reasoning tasks. We compare FD-Replan with current classical planning and belief space planning approaches by solving low entropy goal-directed grocery packing tasks in simulation. FD-Replan shows positive results and promise with respect to planning time, execution time, and task success rate in our simulation experiments.

0
下载
关闭预览

相关内容

机器人(英语:Robot)包括一切模拟人类行为或思想与模拟其他生物的机械(如机器狗,机器猫等)。狭义上对机器人的定义还有很多分类法及争议,有些电脑程序甚至也被称为机器人。在当代工业中,机器人指能自动运行任务的人造机器设备,用以取代或协助人类工作,一般会是机电设备,由计算机程序或是电子电路控制。

We analyze phase transitions in the conditional entropy of a sequence caused by a change in the conditional variables. Such transitions happen, for example, when training to learn the parameters of a system, since the transition from the training phase to the data phase causes a discontinuous jump in the conditional entropy of the measured system response. For large-scale systems, we present a method of computing a bound on the mutual information obtained with one-shot training, and show that this bound can be calculated using the difference between two derivatives of a conditional entropy. The system model does not require Gaussianity or linearity in the parameters, and does not require worst-case noise approximations or explicit estimation of any unknown parameters. The model applies to a broad range of algorithms and methods in communication, signal processing, and machine learning that employ training as part of their operation.

0
0
下载
预览

Inference and decision making under uncertainty are key processes in every autonomous system and numerous robotic problems. In recent years, the similarities between inference and decision making triggered much work, from developing unified computational frameworks to pondering about the duality between the two. In spite of these efforts, inference and control, as well as inference and belief space planning (BSP) are still treated as two separate processes. In this paper we propose a paradigm shift, a novel approach which deviates from conventional Bayesian inference and utilizes the similarities between inference and BSP. We make the key observation that inference can be efficiently updated using predictions made during the decision making stage, even in light of inconsistent data association between the two. We developed a two staged process that implements our novel approach and updates inference using calculations from the precursory planning phase. Using autonomous navigation in an unknown environment along with iSAM2 efficient methodologies as a test case, we benchmarked our novel approach against standard Bayesian inference, both with synthetic and real-world data (KITTI dataset). Results indicate that not only our approach improves running time by at least a factor of two while providing the same estimation accuracy, but it also alleviates the computational burden of state dimensionality and loop closures.

0
0
下载
预览

This paper focuses on inverse reinforcement learning for autonomous navigation using distance and semantic category observations. The objective is to infer a cost function that explains demonstrated behavior while relying only on the expert's observations and state-control trajectory. We develop a map encoder, that infers semantic category probabilities from the observation sequence, and a cost encoder, defined as a deep neural network over the semantic features. Since the expert cost is not directly observable, the model parameters can only be optimized by differentiating the error between demonstrated controls and a control policy computed from the cost estimate. We propose a new model of expert behavior that enables error minimization using a closed-form subgradient computed only over a subset of promising states via a motion planning algorithm. Our approach allows generalizing the learned behavior to new environments with new spatial configurations of the semantic categories. We analyze the different components of our model in a minigrid environment. We also demonstrate that our approach learns to follow traffic rules in the autonomous driving CARLA simulator by relying on semantic observations of buildings, sidewalks, and road lanes.

0
0
下载
预览

We present Residual Policy Learning (RPL): a simple method for improving nondifferentiable policies using model-free deep reinforcement learning. RPL thrives in complex robotic manipulation tasks where good but imperfect controllers are available. In these tasks, reinforcement learning from scratch remains data-inefficient or intractable, but learning a residual on top of the initial controller can yield substantial improvement. We study RPL in five challenging MuJoCo tasks involving partial observability, sensor noise, model misspecification, and controller miscalibration. By combining learning with control algorithms, RPL can perform long-horizon, sparse-reward tasks for which reinforcement learning alone fails. Moreover, we find that RPL consistently and substantially improves on the initial controllers. We argue that RPL is a promising approach for combining the complementary strengths of deep reinforcement learning and robotic control, pushing the boundaries of what either can achieve independently.

0
3
下载
预览

Autonomous urban driving navigation with complex multi-agent dynamics is under-explored due to the difficulty of learning an optimal driving policy. The traditional modular pipeline heavily relies on hand-designed rules and the pre-processing perception system while the supervised learning-based models are limited by the accessibility of extensive human experience. We present a general and principled Controllable Imitative Reinforcement Learning (CIRL) approach which successfully makes the driving agent achieve higher success rates based on only vision inputs in a high-fidelity car simulator. To alleviate the low exploration efficiency for large continuous action space that often prohibits the use of classical RL on challenging real tasks, our CIRL explores over a reasonably constrained action space guided by encoded experiences that imitate human demonstrations, building upon Deep Deterministic Policy Gradient (DDPG). Moreover, we propose to specialize adaptive policies and steering-angle reward designs for different control signals (i.e. follow, straight, turn right, turn left) based on the shared representations to improve the model capability in tackling with diverse cases. Extensive experiments on CARLA driving benchmark demonstrate that CIRL substantially outperforms all previous methods in terms of the percentage of successfully completed episodes on a variety of goal-directed driving tasks. We also show its superior generalization capability in unseen environments. To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.

0
4
下载
预览

Most policy search algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data", we refer to this challenge as "micro-data reinforcement learning". We show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based policy search), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots (e.g., humanoids), designing generic priors, and optimizing the computing time.

0
3
下载
预览

Starting with the idea that sentiment analysis models should be able to predict not only positive or negative but also other psychological states of a person, we implement a sentiment analysis model to investigate the relationship between the model and emotional state. We first examine psychological measurements of 64 participants and ask them to write a book report about a story. After that, we train our sentiment analysis model using crawled movie review data. We finally evaluate participants' writings, using the pretrained model as a concept of transfer learning. The result shows that sentiment analysis model performs good at predicting a score, but the score does not have any correlation with human's self-checked sentiment.

0
8
下载
预览

Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent's task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.

0
5
下载
预览

Current methods for video analysis often extract frame-level features using pre-trained convolutional neural networks (CNNs). Such features are then aggregated over time e.g., by simple temporal averaging or more sophisticated recurrent neural networks such as long short-term memory (LSTM) or gated recurrent units (GRU). In this work we revise existing video representations and study alternative methods for temporal aggregation. We first explore clustering-based aggregation layers and propose a two-stream architecture aggregating audio and visual features. We then introduce a learnable non-linear unit, named Context Gating, aiming to model interdependencies among network activations. Our experimental results show the advantage of both improvements for the task of video classification. In particular, we evaluate our method on the large-scale multi-modal Youtube-8M v2 dataset and outperform all other methods in the Youtube 8M Large-Scale Video Understanding challenge.

0
3
下载
预览

This paper presents a safety-aware learning framework that employs an adaptive model learning method together with barrier certificates for systems with possibly nonstationary agent dynamics. To extract the dynamic structure of the model, we use a sparse optimization technique, and the resulting model will be used in combination with control barrier certificates which constrain feedback controllers only when safety is about to be violated. Under some mild assumptions, solutions to the constrained feedback-controller optimization are guaranteed to be globally optimal, and the monotonic improvement of a feedback controller is thus ensured. In addition, we reformulate the (action-)value function approximation to make any kernel-based nonlinear function estimation method applicable. We then employ a state-of-the-art kernel adaptive filtering technique for the (action-)value function approximation. The resulting framework is verified experimentally on a brushbot, whose dynamics is unknown and highly complex.

0
3
下载
预览
小贴士
相关论文
Residual Policy Learning
Tom Silver,Kelsey Allen,Josh Tenenbaum,Leslie Kaelbling
3+阅读 · 2018年12月15日
CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving
Xiaodan Liang,Tairui Wang,Luona Yang,Eric Xing
4+阅读 · 2018年7月10日
Konstantinos Chatzilygeroudis,Vassilis Vassiliades,Freek Stulp,Sylvain Calinon,Jean-Baptiste Mouret
3+阅读 · 2018年7月6日
Hwiyeol Jo,Jeong Ryu
8+阅读 · 2018年6月3日
Fangkai Yang,Daoming Lyu,Bo Liu,Steven Gustafson
5+阅读 · 2018年4月20日
Antoine Miech,Ivan Laptev,Josef Sivic
3+阅读 · 2018年3月5日
Motoya Ohnishi,Li Wang,Gennaro Notomista,Magnus Egerstedt
3+阅读 · 2018年1月29日
相关资讯
Transferring Knowledge across Learning Processes
CreateAMind
6+阅读 · 2019年5月18日
已删除
将门创投
7+阅读 · 2019年3月28日
Call for Participation: Shared Tasks in NLPCC 2019
中国计算机学会
5+阅读 · 2019年3月22日
Unsupervised Learning via Meta-Learning
CreateAMind
26+阅读 · 2019年1月3日
RL 真经
CreateAMind
4+阅读 · 2018年12月28日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
10+阅读 · 2018年12月24日
spinningup.openai 强化学习资源完整
CreateAMind
4+阅读 · 2018年12月17日
Hierarchical Imitation - Reinforcement Learning
CreateAMind
15+阅读 · 2018年5月25日
强化学习族谱
CreateAMind
8+阅读 · 2017年8月2日
Top