Motion capture (mocap) and time-of-flight based sensing of human actions are becoming increasingly popular modalities to perform robust activity analysis. Applications range from action recognition to quantifying movement quality for health applications. While marker-less motion capture has made great progress, in critical applications such as healthcare, marker-based systems, especially active markers, are still considered gold-standard. However, there are several practical challenges in both modalities such as visibility, tracking errors, and simply the need to keep marker setup convenient wherein movements are recorded with a reduced marker-set. This implies that certain joint locations will not even be marked-up, making downstream analysis of full body movement challenging. To address this gap, we first pose the problem of reconstructing the unmarked joint data as an ill-posed linear inverse problem. We recover missing joints for a given action by projecting it onto the manifold of human actions, this is achieved by optimizing the latent space representation of a deep autoencoder. Experiments on both mocap and Kinect datasets clearly demonstrate that the proposed method performs very well in recovering semantics of the actions and dynamics of missing joints. We will release all the code and models publicly.

### 相关内容

Human activity recognition in videos has been widely studied and has recently gained significant advances with deep learning approaches; however, it remains a challenging task. In this paper, we propose a novel framework that simultaneously considers both implicit and explicit representations of human interactions by fusing information of local image where the interaction actively occurred, primitive motion with the posture of individual subject's body parts, and the co-occurrence of overall appearance change. Human interactions change, depending on how the body parts of each human interact with the other. The proposed method captures the subtle difference between different interactions using interacting body part attention. Semantically important body parts that interact with other objects are given more weight during feature representation. The combined feature of interacting body part attention-based individual representation and the co-occurrence descriptor of the full-body appearance change is fed into long short-term memory to model the temporal dynamics over time in a single framework. We validate the effectiveness of the proposed method using four widely used public datasets by outperforming the competing state-of-the-art method.

Hundreds of millions of surgical procedures take place annually across the world, which generate a prevalent type of electronic health record (EHR) data comprising time series physiological signals. Here, we present a transferable embedding method (i.e., a method to transform time series signals into input features for predictive machine learning models) named PHASE (PHysiologicAl Signal Embeddings) that enables us to more accurately forecast adverse surgical outcomes based on physiological signals. We evaluate PHASE on minute-by-minute EHR data of more than 50,000 surgeries from two operating room (OR) datasets and patient stays in an intensive care unit (ICU) dataset. PHASE outperforms other state-of-the-art approaches, such as long-short term memory networks trained on raw data and gradient boosted trees trained on handcrafted features, in predicting five distinct outcomes: hypoxemia, hypocapnia, hypotension, hypertension, and phenylephrine administration. In a transfer learning setting where we train embedding models in one dataset then embed signals and predict adverse events in unseen data, PHASE achieves significantly higher prediction accuracy at lower computational cost compared to conventional approaches. Finally, given the importance of understanding models in clinical applications we demonstrate that PHASE is explainable and validate our predictive models using local feature attribution methods.

As the number of installed meters in buildings increases, there is a growing number of data time-series that could be used to develop data-driven models to support and optimize building operation. However, building data sets are often characterized by errors and missing values, which are considered, by the recent research, among the main limiting factors on the performance of the proposed models. Motivated by the need to address the problem of missing data in building operation, this work presents a data-driven approach to fill these gaps. In this study, three different autoencoder neural networks are trained to reconstruct missing short-term indoor environment data time-series in a data set collected in an office building in Aachen, Germany. This consisted of a four year-long monitoring campaign in and between the years 2014 and 2017, of 84 different rooms. The models are applicable for different time-series obtained from room automation, such as indoor air temperature, relative humidity and $CO_{2}$ data streams. The results prove that the proposed methods outperform classic numerical approaches and they result in reconstructing the corresponding variables with average RMSEs of 0.42 {\deg}C, 1.30 % and 78.41 ppm, respectively.

This paper presents a novel approach using sensitivity analysis for generalizing Differential Dynamic Programming (DDP) to systems characterized by implicit dynamics, such as those modelled via inverse dynamics and variational or implicit integrators. It leads to a more general formulation of DDP, enabling for example the use of the faster recursive Newton-Euler inverse dynamics. We leverage the implicit formulation for precise and exact contact modelling in DDP, where we focus on two contributions: (1) Contact dynamics in acceleration level that enables high-order integration schemes; (2) Formulation using an invertible contact model in the forward pass and a closed form solution in the backward pass to improve the numerical resolution of contacts. The performance of the proposed framework is validated (1) by comparing implicit versus explicit DDP for the swing-up of a double pendulum, and (2) by planning motions for two tasks using a single leg model making multi-body contacts with the environment: standing up from ground, where a priori contact enumeration is challenging, and maintaining balance under an external perturbation.

Rapid developments in streaming data technologies are continuing to generate increased interest in monitoring human activity. Wearable devices, such as wrist-worn sensors that monitor gross motor activity (actigraphy), have become prevalent. An actigraph unit continually records the activity level of an individual, producing a very large amount of data at a high-resolution that can be immediately downloaded and analyzed. While this kind of \textit{big data} includes both spatial and temporal information, the variation in such data seems to be more appropriately modeled by considering stochastic evolution through time while accounting for spatial information separately. We propose a comprehensive Bayesian hierarchical modeling and inferential framework for actigraphy data reckoning with the massive sizes of such databases while attempting to offer full inference. Building upon recent developments in this field, we construct Nearest Neighbour Gaussian Processes (NNGPs) for actigraphy data to compute at large temporal scales. More specifically, we construct a temporal NNGP and we focus on the optimized implementation of the collapsed algorithm in this specific context. This approach permits improved model scaling while also offering full inference. We test and validate our methods on simulated data and subsequently apply and verify their predictive ability on an original dataset concerning a health study conducted by the Fielding School of Public Health of the University of California, Los Angeles.

We develop a novel human trajectory prediction system that incorporates the scene information (Scene-LSTM) as well as individual pedestrian movement (Pedestrian-LSTM) trained simultaneously within static crowded scenes. We superimpose a two-level grid structure (grid cells and subgrids) on the scene to encode spatial granularity plus common human movements. The Scene-LSTM captures the commonly traveled paths that can be used to significantly influence the accuracy of human trajectory prediction in local areas (i.e. grid cells). We further design scene data filters, consisting of a hard filter and a soft filter, to select the relevant scene information in a local region when necessary and combine it with Pedestrian-LSTM for forecasting a pedestrian's future locations. The experimental results on several publicly available datasets demonstrate that our method outperforms related works and can produce more accurate predicted trajectories in different scene contexts.

Man-made scenes can be densely packed, containing numerous objects, often identical, positioned in close proximity. We show that precise object detection in such scenes remains a challenging frontier even for state-of-the-art object detectors. We propose a novel, deep-learning based method for precise object detection, designed for such challenging settings. Our contributions include: (1) A layer for estimating the Jaccard index as a detection quality score; (2) a novel EM merging unit, which uses our quality scores to resolve detection overlap ambiguities; finally, (3) an extensive, annotated data set, SKU-110K, representing packed retail environments, released for training and testing under such extreme settings. Detection tests on SKU-110K and counting tests on the CARPK and PUCPR+ show our method to outperform existing state-of-the-art with substantial margins. The code and data will be made available on \url{www.github.com/eg4000/SKU110K_CVPR19}.

In this work, we take a representation learning perspective on hierarchical reinforcement learning, where the problem of learning lower layers in a hierarchy is transformed into the problem of learning trajectory-level generative models. We show that we can learn continuous latent representations of trajectories, which are effective in solving temporally extended and multi-stage problems. Our proposed model, SeCTAR, draws inspiration from variational autoencoders, and learns latent representations of trajectories. A key component of this method is to learn both a latent-conditioned policy and a latent-conditioned model which are consistent with each other. Given the same latent, the policy generates a trajectory which should match the trajectory predicted by the model. This model provides a built-in prediction mechanism, by predicting the outcome of closed loop policy behavior. We propose a novel algorithm for performing hierarchical RL with this model, combining model-based planning in the learned latent space with an unsupervised exploration objective. We show that our model is effective at reasoning over long horizons with sparse rewards for several simulated tasks, outperforming standard reinforcement learning methods and prior methods for hierarchical reasoning, model-based planning, and exploration.

Coherence plays a critical role in producing a high-quality summary from a document. In recent years, neural extractive summarization is becoming increasingly attractive. However, most of them ignore the coherence of summaries when extracting sentences. As an effort towards extracting coherent summaries, we propose a neural coherence model to capture the cross-sentence semantic and syntactic coherence patterns. The proposed neural coherence model obviates the need for feature engineering and can be trained in an end-to-end fashion using unlabeled data. Empirical results show that the proposed neural coherence model can efficiently capture the cross-sentence coherence patterns. Using the combined output of the neural coherence model and ROUGE package as the reward, we design a reinforcement learning method to train a proposed neural extractive summarizer which is named Reinforced Neural Extractive Summarization (RNES) model. The RNES model learns to optimize coherence and informative importance of the summary simultaneously. Experimental results show that the proposed RNES outperforms existing baselines and achieves state-of-the-art performance in term of ROUGE on CNN/Daily Mail dataset. The qualitative evaluation indicates that summaries produced by RNES are more coherent and readable.

In many real-world settings, a team of agents must coordinate their behaviour while acting in a decentralised way. At the same time, it is often possible to train the agents in a centralised fashion in a simulated or laboratory setting, where global state information is available and communication constraints are lifted. Learning joint action-values conditioned on extra state information is an attractive way to exploit centralised learning, but the best strategy for then extracting decentralised policies is unclear. Our solution is QMIX, a novel value-based method that can train decentralised policies in a centralised end-to-end fashion. QMIX employs a network that estimates joint action-values as a complex non-linear combination of per-agent values that condition only on local observations. We structurally enforce that the joint-action value is monotonic in the per-agent values, which allows tractable maximisation of the joint action-value in off-policy learning, and guarantees consistency between the centralised and decentralised policies. We evaluate QMIX on a challenging set of StarCraft II micromanagement tasks, and show that QMIX significantly outperforms existing value-based multi-agent reinforcement learning methods.

Hugh Chen,Scott Lundberg,Gabe Erion,Jerry H. Kim,Su-In Lee
0+阅读 · 1月21日
Antonio Liguori,Romana Markovic,Thi Thu Ha Dam,Jérôme Frisch,Christoph van Treeck,Francesco Causone
0+阅读 · 1月21日
Pierfrancesco Alaimo Di Loro,Marco Mingione,Jonah Lipsitt,Christina M. Batteate,Michael Jerrett,Sudipto Banerjee
0+阅读 · 1月20日
Manh Huynh,Gita Alaghband
3+阅读 · 2019年8月23日
Eran Goldman,Roei Herzig,Aviv Eisenschtat,Jacob Goldberger,Tal Hassner
3+阅读 · 2019年4月8日
John D. Co-Reyes,YuXuan Liu,Abhishek Gupta,Benjamin Eysenbach,Pieter Abbeel,Sergey Levine
6+阅读 · 2018年6月7日
Yuxiang Wu,Baotian Hu
6+阅读 · 2018年4月19日
Tabish Rashid,Mikayel Samvelyan,Christian Schroeder de Witt,Gregory Farquhar,Jakob Foerster,Shimon Whiteson
5+阅读 · 2018年3月30日

3+阅读 · 2020年3月15日

11+阅读 · 2019年8月13日

10+阅读 · 2019年4月13日

3+阅读 · 2019年3月12日
CreateAMind
4+阅读 · 2019年1月18日
CreateAMind
26+阅读 · 2019年1月3日

17+阅读 · 2018年10月28日

11+阅读 · 2018年7月9日
CreateAMind
3+阅读 · 2018年4月15日
Top