We demonstrate model-based, visual robot manipulation of linear deformable objects. Our approach is based on a state-space representation of the physical system that the robot aims to control. This choice has multiple advantages, including the ease of incorporating physical priors in the dynamics model and perception model, and the ease of planning manipulation actions. In addition, physical states can naturally represent object instances of different appearances. Therefore, dynamics in the state space can be learned in one setting and directly used in other visually different settings. This is in contrast to dynamics learned in pixel space or latent space, where generalization to visual differences are not guaranteed. Challenges in taking the state-space approach are the estimation of the high-dimensional state of a deformable object from raw images, where annotations are very expensive on real data, and finding a dynamics model that is both accurate, generalizable, and efficient to compute. We are the first to demonstrate self-supervised training of rope state estimation on real images, without requiring expensive annotations. This is achieved by our novel differentiable renderer and image loss, which are generalizable across a wide range of visual appearances. With estimated rope states, we train a fast and differentiable neural network dynamics model that encodes the physics of mass-spring systems. Our method has a higher accuracy in predicting future states compared to models that do not involve explicit state estimation and do not use any physics prior. We also show that our approach achieves more efficient manipulation, both in simulation and on a real robot, when used within a model predictive controller.

This paper presents a generalized flexible Hybrid Cable-Driven Robot (HCDR). For the proposed HCDR, the derivation of the equations of motion and proof provide a very effective way to find items for generalized system modeling. The proposed dynamic modeling approach avoids the drawback of traditional methods and can be easily extended to other types of hybrid robots such as a robot arm mounted on an aircraft platform. Additionally, another goal of this paper is to develop integrated control systems to reduce vibrations and improve the accuracy and performance of the HCDR. To achieve this goal, redundancy resolution, stiffness optimization, and control strategies are studied. The proposed optimization problem and algorithm address the limitations of existing stiffness optimization approaches. Three types of control architecture are proposed and their performances (i.e., reducing undesirable vibrations and trajectory tracking errors, especially for the end-effector) are evaluated using several well-designed case studies. Results show that the fully integrated control strategy can improve significantly the tracking performance of the end-effector.

This paper investigates non-myopic path planning of mobile sensors for multi-target tracking. Such problem has posed a high computational complexity issue and/or the necessity of high-level decision making. Existing works tackle these issues by heuristically assigning targets to each sensing agent and solving the split problem for each agent. However, such heuristic methods reduce the target estimation performance in the absence of considering the changes of target state estimation along time. In this work, we detour the task-assignment problem by reformulating the general non-myopic planning problem to a distributed optimization problem with respect to targets. By combining alternating direction method of multipliers (ADMM) and local trajectory optimization method, we solve the problem and induce consensus (i.e., high-level decisions) automatically among the targets. In addition, we propose a modified receding-horizon control (RHC) scheme and edge-cutting method for efficient real-time operation. The proposed algorithm is validated through simulations in various scenarios.

The Industrial Assembly Challenge at the World Robot Summit was held in 2018 to showcase the state-of-the-art of autonomous manufacturing systems. The challenge included various tasks, such as bin picking, kitting, and assembly of standard industrial parts into 2D and 3D assemblies. Some of the tasks were only revealed at the competition itself, representing the challenge of "level 5" automation, i. e., programming and setting up an autonomous assembly system in less than one day. We conducted a survey among the teams that participated in the challenge and investigated aspects such as team composition, development costs, system setups as well as the teams' strategies and approaches. An analysis of the survey results reveals that the competitors have been in two camps: those constructing conventional robotic work cells with off-the-shelf tools, and teams who mostly relied on custom-made end effectors and novel software approaches in combination with collaborative robots. While both camps performed reasonably well, the winning team chose a middle ground in between, combining the efficiency of established play-back programming with the autonomy gained by CAD-based object detection and force control for assembly operations.

We address goal-based imitation learning, where the aim is to output the symbolic goal from a third-person video demonstration. This enables the robot to plan for execution and reproduce the same goal in a completely different environment. The key challenge is that the goal of a video demonstration is often ambiguous at the level of semantic actions. The human demonstrators might unintentionally achieve certain subgoals in the demonstrations with their actions. Our main contribution is to propose a motion reasoning framework that combines task and motion planning to disambiguate the true intention of the demonstrator in the video demonstration. This allows us to robustly recognize the goals that cannot be disambiguated by previous action-based approaches. We evaluate our approach by collecting a dataset of 96 video demonstrations in a mockup kitchen environment. We show that our motion reasoning plays an important role in recognizing the actual goal of the demonstrator and improves the success rate by over 20%. We further show that by using the automatically inferred goal from the video demonstration, our robot is able to reproduce the same task in a real kitchen environment.

While visual localization or SLAM has witnessed great progress in past decades, when deploying it on a mobile robot in practice, few works have explicitly considered the kinematic (or dynamic) constraints of the real robotic system when designing state estimators. To promote the practical deployment of current state-of-the-art visual-inertial localization algorithms, in this work we propose a low-cost kinematics-constrained localization system particularly for a skid-steering mobile robot. In particular, we derive in a principle way the robot's kinematic constraints based on the instantaneous centers of rotation (ICR) model and integrate them in a tightly-coupled manner into the sliding-window bundle adjustment (BA)-based visual-inertial estimator. Because the ICR model parameters are time-varying due to, for example, track-to-terrain interaction and terrain roughness, we estimate these kinematic parameters online along with the navigation state. To this end, we perform in-depth the observability analysis and identify motion conditions under which the state/parameter estimation is viable. The proposed kinematics-constrained visual-inertial localization system has been validated extensively in different terrain scenarios.

Safe and efficient path planning is crucial for autonomous mobile robots. A prerequisite for path planning is to have a comprehensive understanding of the 3D structure of the robot's environment. On MAVs this is commonly achieved using low-cost sensors, such as stereo or RGB-D cameras. These sensors may fail to provide depth measurements in textureless or IR-absorbing areas and have limited effective range. In path planning, this results in inefficient trajectories or failure to recognize a feasible path to the goal, hence significantly impairing the robot's mobility. Recent advances in deep learning enables us to exploit prior experience about the shape of the world and hence to infer complete depth maps from color images and additional sparse depth measurements. In this work, we present an augmented planning system and investigate the effects of employing state-of-the-art depth completion techniques, specifically trained to augment sparse depth maps originating from RGB-D sensors, semi-dense methods and stereo matchers. We extensively evaluate our approach in online path planning experiments based on simulated data, as well as global path planning experiments based on real world MAV data. We show that our augmented system, provided with only sparse depth perception, can reach on-par performance to ground truth depth input in simulated online planning experiments. On real world MAV data the augmented system demonstrates superior performance compared to a planner based on very dense RGB-D depth maps.

We present Kaolin, a PyTorch library aiming to accelerate 3D deep learning research. Kaolin provides efficient implementations of differentiable 3D modules for use in deep learning systems. With functionality to load and preprocess several popular 3D datasets, and native functions to manipulate meshes, pointclouds, signed distance functions, and voxel grids, Kaolin mitigates the need to write wasteful boilerplate code. Kaolin packages together several differentiable graphics modules including rendering, lighting, shading, and view warping. Kaolin also supports an array of loss functions and evaluation metrics for seamless evaluation and provides visualization functionality to render the 3D results. Importantly, we curate a comprehensive model zoo comprising many state-of-the-art 3D deep learning architectures, to serve as a starting point for future research endeavours. Kaolin is available as open-source software at https://github.com/NVIDIAGameWorks/kaolin/.

Service robots should be able to operate autonomously in dynamic and daily changing environments over an extended period of time. While Simultaneous Localization And Mapping (SLAM) is one of the most fundamental problems for robotic autonomy, most existing SLAM works are evaluated with data sequences that are recorded in a short period of time. In real-world deployment, there can be out-of-sight scene changes caused by both natural factors and human activities. For example, in home scenarios, most objects may be movable, replaceable or deformable, and the visual features of the same place may be significantly different in some successive days. Such out-of-sight dynamics pose great challenges to the robustness of pose estimation, and hence a robot's long-term deployment and operation. To differentiate the forementioned problem from the conventional works which are usually evaluated in a static setting in a single run, the term lifelong SLAM is used here to address SLAM problems in an ever-changing environment over a long period of time. To accelerate lifelong SLAM research, we release the OpenLORIS-Scene datasets. The data are collected in real-world indoor scenes, for multiple times in each place to include scene changes in real life. We also design benchmarking metrics for lifelong SLAM, with which the robustness and accuracy of pose estimation are evaluated separately. The datasets and benchmark are available online at https://lifelong-robotic-vision.github.io/dataset/scene.

Contemporary autopilot systems for unmanned aerial vehicles (UAVs) are far more limited in their flight envelope as compared to experienced human pilots, thereby restricting the conditions UAVs can operate in and the types of missions they can accomplish autonomously. This paper proposes a deep reinforcement learning (DRL) controller to handle the nonlinear attitude control problem, enabling extended flight envelopes for fixed-wing UAVs. A proof-of-concept controller using the proximal policy optimization (PPO) algorithm is developed, and is shown to be capable of stabilizing a fixed-wing UAV from a large set of initial conditions to reference roll, pitch and airspeed values. The training process is outlined and key factors for its progression rate are considered, with the most important factor found to be limiting the number of variables in the observation vector, and including values for several previous time steps for these variables. The trained reinforcement learning (RL) controller is compared to a proportional-integral-derivative (PID) controller, and is found to converge in more cases than the PID controller, with comparable performance. Furthermore, the RL controller is shown to generalize well to unseen disturbances in the form of wind and turbulence, even in severe disturbance conditions.

Learning from offline task demonstrations is a problem of great interest in robotics. For simple short-horizon manipulation tasks with modest variation in task instances, offline learning from a small set of demonstrations can produce controllers that successfully solve the task. However, leveraging a fixed batch of data can be problematic for larger datasets and longer-horizon tasks with greater variations. The data can exhibit substantial diversity and consist of suboptimal solution approaches. In this paper, we propose Implicit Reinforcement without Interaction at Scale (IRIS), a novel framework for learning from large-scale demonstration datasets. IRIS factorizes the control problem into a goal-conditioned low-level controller that imitates short demonstration sequences and a high-level goal selection mechanism that sets goals for the low-level and selectively combines parts of suboptimal solutions leading to more successful task completions. We evaluate IRIS across three datasets, including the RoboTurk Cans dataset collected by humans via crowdsourcing, and show that performant policies can be learned from purely offline learning. Additional results and videos at https://stanfordvl.github.io/iris/ .

In this paper, we show that for a minimal pose-graph problem, even in the ideal case of perfect measurements and spherical covariance, using the so-called "wrap function" when comparing angles results in multiple suboptimal local minima. We numerically estimate regions of attraction to these local minima for some numerical examples, and give evidence to show that they are of nonzero measure. In contrast, under the same assumptions, we show that the \textit{chordal distance} representation of angle error has a unique minimum up to periodicity. For chordal cost, we also search for initial conditions that fail to converge to the global minimum, and find that this occurs with far fewer points than with geodesic cost.

In this paper, we propose a novel penetration metric, called deformable penetration depth PDd, to define a measure of inter-penetration between two linearly deforming tetrahedra using the object norm. First of all, we show that a distance metric for a tetrahedron deforming between two configurations can be found in closed form based on object norm. Then, we show that the PDd between an intersecting pair of static and deforming tetrahedra can be found by solving a quadratic programming (QP) problem in terms of the distance metric with non-penetration constraints. We also show that the PDd between two, intersected, deforming tetrahedra can be found by solving a similar QP problem under some assumption on penetrating directions, and it can be also accelerated by an order of magnitude using pre-calculated penetration direction. We have implemented our algorithm on a standard PC platform using an off-the-shelf QP optimizer, and experimentally show that both the static/deformable and deformable/deformable tetrahedra cases can be solvable in from a few to tens of milliseconds. Finally, we demonstrate that our penetration metric is three-times smaller (or tighter) than the classical, rigid penetration depth metric in our experiments.

We consider a distributed system of n identical mobile robots operating in the two dimensional Euclidian plane. As in the previous studies, we consider the robots to be anonymous, oblivious, dis-oriented, and without any communication capabilities, operating based on the Look-Compute-Move model where the next location of a robot depends only on its view of the current configuration. Even in this seemingly weak model, most formation problems which require constructing specific configurations, can be solved quite easily when the robots are fully synchronized with each other. In this paper we introduce and study a new class of problems which, unlike the formation problems so far, cannot always be solved even in the fully synchronous model with atomic and rigid moves. This class of problems requires the robots to permute their locations in the plane. In particular, we are interested in implementing two special types of permutations -- permutations without any fixed points and permutations of order $n$. The former (called MOVE-ALL) requires each robot to visit at least two of the initial locations, while the latter (called VISIT-ALL) requires every robot to visit each of the initial locations in a periodic manner. We provide a characterization of the solvability of these problems, showing the main challenges in solving this class of problems for mobile robots. We also provide algorithms for the feasible cases, in particular distinguishing between one-step algorithms (where each configuration must be a permutation of the original configuration) and multi-step algorithms (which allow intermediate configurations). These results open a new research direction in mobile distributed robotics which has not been investigated before.

Robotic systems in manufacturing applications commonly assume known object geometry and appearance. This simplifies the task for the 3D perception algorithms and allows the manipulation to be more deterministic. However, those approaches are not easily transferable to the agricultural and food domains due to the variability and deformability of natural food. We demonstrate an approach applied to poultry products that allows picking up a whole chicken from an unordered bin using a suction cup gripper, estimating its pose using a Deep Learning approach, and placing it in a canonical orientation where it can be further processed. Our robotic system was experimentally evaluated and is able to generalize to object variations and achieves high accuracy on bin picking and pose estimation tasks in a real-world environment.

Top