We introduce ROS-X-Habitat, a software interface that bridges the AI Habitat platform for embodied reinforcement learning agents with other robotics resources via ROS. This interface not only offers standardized communication protocols between embodied agents and simulators, but also enables physics-based simulation. With this interface, roboticists are able to train their own Habitat RL agents in another simulation environment or to develop their own robotic algorithms inside Habitat Sim. Through in silico experiments, we demonstrate that ROS-X-Habitat has minimal impact on the navigation performance and simulation speed of Habitat agents; that a standard set of ROS mapping, planning and navigation tools can run in the Habitat simulator, and that a Habitat agent can run in the standard ROS simulator Gazebo.

0
下载
关闭预览

相关内容

Self-Driven Particles (SDP) describe a category of multi-agent systems common in everyday life, such as flocking birds and traffic flows. In a SDP system, each agent pursues its own goal and constantly changes its cooperative or competitive behaviors with its nearby agents. Manually designing the controllers for such SDP system is time-consuming, while the resulting emergent behaviors are often not realistic nor generalizable. Thus the realistic simulation of SDP systems remains challenging. Reinforcement learning provides an appealing alternative for automating the development of the controller for SDP. However, previous multi-agent reinforcement learning (MARL) methods define the agents to be teammates or enemies before hand, which fail to capture the essence of SDP where the role of each agent varies to be cooperative or competitive even within one episode. To simulate SDP with MARL, a key challenge is to coordinate agents' behaviors while still maximizing individual objectives. Taking traffic simulation as the testing bed, in this work we develop a novel MARL method called Coordinated Policy Optimization (CoPO), which incorporates social psychology principle to learn neural controller for SDP. Experiments show that the proposed method can achieve superior performance compared to MARL baselines in various metrics. Noticeably the trained vehicles exhibit complex and diverse social behaviors that improve performance and safety of the population as a whole. Demo video and source code are available at: https://decisionforce.github.io/CoPO/

0
0
下载
预览

In several areas of knowledge, self-efficacy is related to the perfomance of individuals, including in Software Engineering. However,it is not clear how self-efficacy can be modified in training conducted by the industry. Furthermore, we still do not understand how self-efficacy can impact an individual's team and career in the industry. This lack of understanding can negatively impact how companies and individuals perceive the importance of self-efficacy in the field. Therefore, We present a research proposal that aims to understand the relationship between self-efficacy and training in Software Engineering. Moreover, we look to understand the role of self-efficacy at Software Development industry. We propose a longitudinal case study with software engineers at Zup Innovation that participating of our bootcamp training. We expect to collect data to support our assumptions that self-efficacy can be related to training in Software Engineering. The other assumption is that self-efficacy at the beginning of training is higher than the middle, and that self-efficacy at the end of training is higher than the middle. We expect that the study proposed in this article will motivate a discussion about self-efficacy and the importance of training employers in the industry of software development.

0
0
下载
预览

Humans and other animals often follow the decisions made by others because these are indicative of the quality of possible choices, resulting in `social response rules': observed relationships between the probability that an agent will make a specific choice and the decisions other individuals have made. The form of social responses can be understood by considering the behaviour of rational agents that seek to maximise their expected utility using both social and private information. Previous derivations of social responses assume that agents observe all others within a group, but real interaction networks are often characterised by sparse connectivity. Here I analyse the observable behaviour of rational agents that attend to the decisions made by a subset of others in the group. This reveals an adaptive strategy in sparsely-connected networks based on highly-simplified social information: the difference in the observed number of agents choosing each option. Where agents employ this strategy, collective outcomes and decision-making efficacy are controlled by the social connectivity at the time of the decision, rather than that to which the agents are accustomed, providing an important caveat for sociality observed in the laboratory and suggesting a basis for the social dynamics of highly-connected online communities.

0
0
下载
预览

Image-to-image (i2i) networks struggle to capture local changes because they do not affect the global scene structure. For example, translating from highway scenes to offroad, i2i networks easily focus on global color features but ignore obvious traits for humans like the absence of lane markings. In this paper, we leverage human knowledge about spatial domain characteristics which we refer to as 'local domains' and demonstrate its benefit for image-to-image translation. Relying on a simple geometrical guidance, we train a patch-based GAN on few source data and hallucinate a new unseen domain which subsequently eases transfer learning to target. We experiment on three tasks ranging from unstructured environments to adverse weather. Our comprehensive evaluation setting shows we are able to generate realistic translations, with minimal priors, and training only on a few images. Furthermore, when trained on our translations images we show that all tested proxy tasks are significantly improved, without ever seeing target domain at training.

0
0
下载
预览

Planning in Markov decision processes (MDPs) typically optimises the expected cost. However, optimising the expectation does not consider the risk that for any given run of the MDP, the total cost received may be unacceptably high. An alternative approach is to find a policy which optimises a risk-averse objective such as conditional value at risk (CVaR). In this work, we begin by showing that there can be multiple policies which obtain the optimal CVaR. We formulate the lexicographic optimisation problem of minimising the expected cost subject to the constraint that the CVaR of the total cost is optimal. We present an algorithm for this problem and evaluate our approach on three domains, including a road navigation domain based on real traffic data. Our experimental results demonstrate that our lexicographic approach attains improved expected cost while maintaining the optimal CVaR.

0
0
下载
预览

Imitation learning aims to extract knowledge from human experts' demonstrations or artificially created agents in order to replicate their behaviors. Its success has been demonstrated in areas such as video games, autonomous driving, robotic simulations and object manipulation. However, this replicating process could be problematic, such as the performance is highly dependent on the demonstration quality, and most trained agents are limited to perform well in task-specific environments. In this survey, we provide a systematic review on imitation learning. We first introduce the background knowledge from development history and preliminaries, followed by presenting different taxonomies within Imitation Learning and key milestones of the field. We then detail challenges in learning strategies and present research opportunities with learning policy from suboptimal demonstration, voice instructions and other associated optimization schemes.

0
9
下载
预览

Fairness has emerged as a critical problem in federated learning (FL). In this work, we identify a cause of unfairness in FL -- \emph{conflicting} gradients with large differences in the magnitudes. To address this issue, we propose the federated fair averaging (FedFV) algorithm to mitigate potential conflicts among clients before averaging their gradients. We first use the cosine similarity to detect gradient conflicts, and then iteratively eliminate such conflicts by modifying both the direction and the magnitude of the gradients. We further show the theoretical foundation of FedFV to mitigate the issue conflicting gradients and converge to Pareto stationary solutions. Extensive experiments on a suite of federated datasets confirm that FedFV compares favorably against state-of-the-art methods in terms of fairness, accuracy and efficiency.

0
5
下载
预览

This paper presents an upgraded, real world application oriented version of gym-gazebo, the Robot Operating System (ROS) and Gazebo based Reinforcement Learning (RL) toolkit, which complies with OpenAI Gym. The content discusses the new ROS 2 based software architecture and summarizes the results obtained using Proximal Policy Optimization (PPO). Ultimately, the output of this work presents a benchmarking system for robotics that allows different techniques and algorithms to be compared using the same virtual conditions. We have evaluated environments with different levels of complexity of the Modular Articulated Robotic Arm (MARA), reaching accuracies in the millimeter scale. The converged results show the feasibility and usefulness of the gym-gazebo 2 toolkit, its potential and applicability in industrial use cases, using modular robots.

0
5
下载
预览

To solve complex real-world problems with reinforcement learning, we cannot rely on manually specified reward functions. Instead, we can have humans communicate an objective to the agent directly. In this work, we combine two approaches to learning from human feedback: expert demonstrations and trajectory preferences. We train a deep neural network to model the reward function and use its predicted reward to train an DQN-based deep reinforcement learning agent on 9 Atari games. Our approach beats the imitation learning baseline in 7 games and achieves strictly superhuman performance on 2 games without using game rewards. Additionally, we investigate the goodness of fit of the reward model, present some reward hacking problems, and study the effects of noise in the human labels.

0
4
下载
预览

Autonomous urban driving navigation with complex multi-agent dynamics is under-explored due to the difficulty of learning an optimal driving policy. The traditional modular pipeline heavily relies on hand-designed rules and the pre-processing perception system while the supervised learning-based models are limited by the accessibility of extensive human experience. We present a general and principled Controllable Imitative Reinforcement Learning (CIRL) approach which successfully makes the driving agent achieve higher success rates based on only vision inputs in a high-fidelity car simulator. To alleviate the low exploration efficiency for large continuous action space that often prohibits the use of classical RL on challenging real tasks, our CIRL explores over a reasonably constrained action space guided by encoded experiences that imitate human demonstrations, building upon Deep Deterministic Policy Gradient (DDPG). Moreover, we propose to specialize adaptive policies and steering-angle reward designs for different control signals (i.e. follow, straight, turn right, turn left) based on the shared representations to improve the model capability in tackling with diverse cases. Extensive experiments on CARLA driving benchmark demonstrate that CIRL substantially outperforms all previous methods in terms of the percentage of successfully completed episodes on a variety of goal-directed driving tasks. We also show its superior generalization capability in unseen environments. To our knowledge, this is the first successful case of the learned driving policy through reinforcement learning in the high-fidelity simulator, which performs better-than supervised imitation learning.

0
4
下载
预览
小贴士
相关论文
Zhenghao Peng,Quanyi Li,Ka Ming Hui,Chunxiao Liu,Bolei Zhou
0+阅读 · 10月26日
Danilo Monteiro Ribeiro,Alberto Souza,Victor Santiago,Danilo Lucena,Geraldo Gomes,Gustavo Pinto
0+阅读 · 10月26日
Anthony Dell'Eva,Fabio Pizzati,Massimo Bertozzi,Raoul de Charette
0+阅读 · 10月26日
Marc Rigter,Paul Duckworth,Bruno Lacerda,Nick Hawes
0+阅读 · 10月25日
Boyuan Zheng,Sunny Verma,Jianlong Zhou,Ivor Tsang,Fang Chen
9+阅读 · 6月23日
Zheng Wang,Xiaoliang Fan,Jianzhong Qi,Chenglu Wen,Cheng Wang,Rongshan Yu
5+阅读 · 4月30日
gym-gazebo2, a toolkit for reinforcement learning using ROS 2 and Gazebo
Nestor Gonzalez Lopez,Yue Leire Erro Nuin,Elias Barba Moral,Lander Usategui San Juan,Alejandro Solano Rueda,Víctor Mayoral Vilches,Risto Kojcev
5+阅读 · 2019年3月14日
Borja Ibarz,Jan Leike,Tobias Pohlen,Geoffrey Irving,Shane Legg,Dario Amodei
4+阅读 · 2018年11月15日
CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving
Xiaodan Liang,Tairui Wang,Luona Yang,Eric Xing
4+阅读 · 2018年7月10日
相关VIP内容
专知会员服务
23+阅读 · 4月2日
专知会员服务
11+阅读 · 2月8日
专知会员服务
22+阅读 · 2020年9月6日
专知会员服务
43+阅读 · 2020年7月26日
[综述]深度学习下的场景文本检测与识别
专知会员服务
41+阅读 · 2019年10月10日
【哈佛大学商学院课程Fall 2019】机器学习可解释性
专知会员服务
54+阅读 · 2019年10月9日
【CMU】机器学习导论课程(Introduction to Machine Learning)
专知会员服务
36+阅读 · 2019年8月26日
相关资讯
Hierarchically Structured Meta-learning
CreateAMind
12+阅读 · 2019年5月22日
Transferring Knowledge across Learning Processes
CreateAMind
8+阅读 · 2019年5月18日
Unsupervised Learning via Meta-Learning
CreateAMind
29+阅读 · 2019年1月3日
大神 一年100篇论文
CreateAMind
14+阅读 · 2018年12月31日
A Technical Overview of AI & ML in 2018 & Trends for 2019
待字闺中
10+阅读 · 2018年12月24日
已删除
将门创投
8+阅读 · 2018年10月31日
Hierarchical Disentangled Representations
CreateAMind
3+阅读 · 2018年4月15日
carla 体验效果 及代码
CreateAMind
7+阅读 · 2018年2月3日
【学习】Hierarchical Softmax
机器学习研究会
3+阅读 · 2017年8月6日
Top