Recent advances in 3D sensing have created unique challenges for computer vision. One fundamental challenge is finding a good representation for 3D sensor data. Most popular representations (such as PointNet) are proposed in the context of processing truly 3D data (e.g. points sampled from mesh models), ignoring the fact that 3D sensored data such as a LiDAR sweep is in fact 2.5D. We argue that representing 2.5D data as collections of (x, y, z) points fundamentally destroys hidden information about freespace. In this paper, we demonstrate such knowledge can be efficiently recovered through 3D raycasting and readily incorporated into batch-based gradient learning. We describe a simple approach to augmenting voxel-based networks with visibility: we add a voxelized visibility map as an additional input stream. In addition, we show that visibility can be combined with two crucial modifications common to state-of-the-art 3D detectors: synthetic data augmentation of virtual objects and temporal aggregation of LiDAR sweeps over multiple time frames. On the NuScenes 3D detection benchmark, we show that, by adding an additional stream for visibility input, we can significantly improve the overall detection accuracy of a state-of-the-art 3D detector.
Skin-like tactile sensors provide robots with rich feedback related to the force distribution applied to their soft surface. The complexity of interpreting raw tactile information has driven the use of machine learning algorithms to convert the sensory feedback to the quantities of interest. However, the lack of ground truth sources for the entire contact force distribution has mainly limited these techniques to the sole estimation of the total contact force and the contact center on the sensor's surface. The method presented in this article uses a finite element model to obtain ground truth data for the three-dimensional force distribution. The model is obtained with state-of-the-art material characterization methods and is evaluated in an indentation setup, where it shows high agreement with the measurements retrieved from a commercial force-torque sensor. The proposed technique is applied to a vision-based tactile sensor, which aims to reconstruct the contact force distribution purely from images. Thousands of images are matched to ground truth data and are used to train a neural network architecture, which is suitable for real-time predictions.
Understanding human behavior is key for robots and intelligent systems that share a space with people. Accordingly, research that enables such systems to perceive, track, learn and predict human behavior as well as to plan and interact with humans has received increasing attention over the last years. The availability of large human motion datasets that contain relevant levels of difficulty is fundamental to this research. Existing datasets are often limited in terms of information content, annotation quality or variability of human behavior. In this paper, we present TH\"OR, a new dataset with human motion trajectory and eye gaze data collected in an indoor environment with accurate ground truth for position, head orientation, gaze direction, social grouping, obstacles map and goal coordinates. TH\"OR also contains sensor data collected by a 3D lidar and involves a mobile robot navigating the space. We propose a set of metrics to quantitatively analyze motion trajectory datasets such as the average tracking duration, ground truth noise, curvature and speed variation of the trajectories. In comparison to prior art, our dataset has a larger variety in human motion behavior, is less noisy, and contains annotations at higher frequencies.
Gaussian process state-space model (GPSSM) is a probabilistic dynamical system that represents unknown transition and/or measurement models as Gaussian process (GP). The majority of the approaches to learning GP-SSM are focused on handling given time series data. However, in most dynamical systems, data required for model learning arrives sequentially and accumulates over time. Storing all the data requires large amounts of memory, and using it for model learning can be computationally infeasible. To overcome these challenges, we propose an online inference method, onlineGPSSM, for learning the GP-SSM by incorporating stochastic variational inference (VI) and online VI. The proposed method can mitigate the computation time issue without catastrophic forgetting and supports adaptation to changes in a system and/or a real environments. Furthermore, we propose an application of onlineGPSSM to the reinforcement learning (RL) of partially observable dynamical systems by combining onlineGPSSM with Bayesian filtering and trajectory optimization algorithms. Numerical examples are presented to demonstrate the applicability of the proposed method.
We analyze the efficacy of modern neuro-evolutionary strategies for continuous control optimization. Overall the results collected on a wide variety of qualitatively different benchmark problems indicate that these methods are generally effective and scale well with respect to the number of parameters and the complexity of the problem. We demonstrate the importance of using suitable fitness functions or reward criteria since functions that are optimal for reinforcement learning algorithms tend to be sub-optimal for evolutionary strategies and vice versa. Finally, we provide an analysis of the role of hyper-parameters that demonstrates the importance of normalization techniques, especially in complex problems.
Learning-based methods have been used to pro-gram robotic tasks in recent years. However, extensive training is usually required not only for the initial task learning but also for generalizing the learned model to the same task but in different environments. In this paper, we propose a novel Deep Reinforcement Learning algorithm for efficient task generalization and environment adaptation in the robotic task learning problem. The proposed method is able to efficiently generalize the previously learned task by model fusion to solve the environment adaptation problem. The proposed Deep Model Fusion (DMF) method reuses and combines the previously trained model to improve the learning efficiency and results.Besides, we also introduce a Multi-objective Guided Reward(MGR) shaping technique to further improve training efficiency.The proposed method was benchmarked with previous methods in various environments to validate its effectiveness.
Unmanned vehicle technologies are an area of great interest in theory and practice today. These technologies have advanced considerably after the first applications have been implemented and cause a rapid change in human life. Autonomous vehicles are also a big part of these technologies. The most important action of a driver has to do is to follow the lanes on the way to the destination. By using image processing and artificial intelligence techniques, an autonomous vehicle can move successfully without a driver help. They can go from the initial point to the specified target by applying pre-defined rules. There are also rules for proper tracking of the lanes. Many accidents are caused due to insufficient follow-up of the lanes and non-compliance with these rules. The majority of these accidents also result in injury and death. In this paper, we present an autonomous vehicle prototype that follows lanes via image processing techniques, which are a major part of autonomous vehicle technology. Autonomous movement capability is provided by using some image processing algorithms such as canny edge detection, Sobel filter, etc. We implemented and tested these algorithms on the vehicle. The vehicle detected and followed the determined lanes. By that way, it went to the destination successfully.
Self-driving vehicles have expanded dramatically over the last few years. Udacity has release a dataset containing, among other data, a set of images with the steering angle captured during driving. The Udacity challenge aimed to predict steering angle based on only the provided images. We explore two different models to perform high quality prediction of steering angles based on images using different deep learning techniques including Transfer Learning, 3D CNN, LSTM and ResNet. If the Udacity challenge was still ongoing, both of our models would have placed in the top ten of all entries.
This paper proposes a new framework named CASNET to learn control policies that generalize over similar robot types with different morphologies. The proposed framework leverages the structural similarities in robots to learn general-purpose system-representations. These representations can then be used with the choice of learning algorithms to learn policies that generalize over different robots. The learned policies can be used to design general-purpose robot-controllers that are applicable to a wide variety of robots. We demonstrate the effectiveness of the proposed framework by learning control policies for two separate domains: planer manipulation and legged locomotion. The policy learned for planer manipulation is capable of controlling planer manipulators with varying degrees of freedom and link-lengths. For legged locomotion, the learned policy generalizes over different morphologies of the crawling robots. These policies perform on-par with the expert policies trained for individual robot models and achieves zero-shot generalization on models unseen during training, establishing that the final performance of the general policy is bottlenecked by the learning algorithm rather than the proposed framework.
Robotic drawing has become increasingly popular as an entertainment and interactive tool. In this paper we present RoboCoDraw, a real-time collaborative robot-based drawing system that draws stylized human face sketches interactively in front of human users, by using the Generative Adversarial Network (GAN)-based style transfer and a Random-Key Genetic Algorithm (RKGA)-based path optimization. The proposed RoboCoDraw system takes a real human face image as input, converts it to a stylized avatar, then draws it with a robotic arm. A core component in this system is the Avatar-GAN proposed by us, which generates a cartoon avatar face image from a real human face. AvatarGAN is trained with unpaired face and avatar images only and can generate avatar images of much better likeness with human face images in comparison with the vanilla CycleGAN. After the avatar image is generated, it is fed to a line extraction algorithm and converted to sketches. An RKGA-based path optimization algorithm is applied to find a time-efficient robotic drawing path to be executed by the robotic arm. We demonstrate the capability of RoboCoDraw on various face images using a lightweight, safe collaborative robot UR5.
We focus on the problem of class-agnostic instance segmentation of LiDAR point clouds. We propose an approach that combines graph-theoretic search with data-driven learning: it searches over a set of candidate segmentations and returns one where individual segments score well according to a data-driven point-based model of "objectness". We prove that if we score a segmentation by the worst objectness among its individual segments, there is an efficient algorithm that finds the optimal worst-case segmentation among an exponentially large number of candidate segmentations. We also present an efficient algorithm for the average-case. For evaluation, we repurpose KITTI 3D detection as a segmentation benchmark and empirically demonstrate that our algorithms significantly outperform past bottom-up segmentation approaches and top-down object-based algorithms on segmenting point clouds.
In many settings (e.g., robotics) demonstrations provide a natural way to specify sub-tasks; however, most methods for learning from demonstrations either do not provide guarantees that the artifacts learned for the sub-tasks can be safely composed and/or do not explicitly capture history dependencies. Motivated by this deficit, recent works have proposed specializing to task specifications, a class of Boolean non-Markovian rewards which admit well-defined composition and explicitly handle historical dependencies. This work continues this line of research by adapting maximum causal entropy inverse reinforcement learning to estimate the posteriori probability of a specification given a multi-set of demonstrations. The key algorithmic insight is to leverage the extensive literature and tooling on reduced ordered binary decision diagrams to efficiently encode a time unrolled Markov Decision Process.
Nano drones are uniquely equipped for fully autonomous applications due to their agility, low cost, and small size. However, their constrained form factor limits flight time, sensor payload, and compute capability, which poses a significant limitation on the use of source-seeking nano drones in GPS-denied and highly cluttered environments. The primary goal of our work is to demonstrate the effectiveness of deep reinforcement learning in fully autonomous navigation on highly constrained, general-purpose hardware and present a methodology for future applications. To this end, we present a deep reinforcement learning-based light seeking policy that executes, in conjunction with the flight control stack, on a commercially available off-the-shelf ultra-low-power microcontroller (MCU). We describe our methodology for training and executing deep reinforcement learning policies for deployment on constrained, general-purpose MCUs. By carefully designing the network input, we feed features relevant to the agent in finding the source, while reducing computational cost and enabling inference up to 100 Hz. We verify our approach with simulation and in-field testing on a Bitcraze CrazyFlie, achieving 94% success rate in a highly cluttered and randomized test environment. The policy demonstrates efficient light seeking by reaching the goal in simulation in 65 % fewer steps and with 60% shorter paths, compared to a baseline `roomba' algorithm.
Intelligent manipulation benefits from the capacity to flexibly control an end-effector with high degrees of freedom (DoF) and dynamically react to the environment. However, due to the challenges of collecting effective training data and learning efficiently, most grasping algorithms today are limited to top-down movements and open-loop execution. In this work, we propose a new low-cost hardware interface for collecting grasping demonstrations by people in diverse environments. Leveraging this data, we show that it is possible to train a robust end-to-end 6DoF closed-loop grasping model with reinforcement learning that transfers to real robots. A key aspect of our grasping model is that it uses ``action-view'' based rendering to simulate future states with respect to different possible actions. By evaluating these states using a learned value function (Q-function), our method is able to better select corresponding actions that maximize total rewards (i.e., grasping success). Our final grasping system is able to achieve reliable 6DoF closed-loop grasping of novel objects across various scene configurations, as well as dynamic scenes with moving objects.
Aim of this paper is to provide a review of the state of the art in Search and Rescue (SAR) robotics. Suitable robotic applications in the SAR domain are described, and SAR-specific demands and requirements on the various components of a robotic system are pictured. Current research and development in SAR robotics is outlined, and an overview of robotic systems and sub-systems currently in use in SAR and disaster response scenarios is given. Finally we show a number of possible research directions for SAR robots, which might change the overall design and operation of SAR robotics in the longer-term future. All this is meant to support our main idea of taking SAR applications as an applied benchmark for the Field Robotics (FR) domain.