Recently, there is an increasing interest in obtaining the relational structures of the environment in the Reinforcement Learning community. However, the resulting "relations" are not the discrete, logical predicates compatible to the symbolic reasoning such as classical planning or goal recognition. Meanwhile, Latplan (Asai and Fukunaga 2018) bridged the gap between deep-learning perceptual systems and symbolic classical planners. One key component of the system is a Neural Network called State AutoEncoder (SAE), which encodes an image-based input into a propositional representation compatible to classical planning. To get the best of both worlds, we propose First-Order State AutoEncoder, an unsupervised architecture for grounding the first-order logic predicates and facts. Each predicate models a relationship between objects by taking the interpretable arguments and returning a propositional value. In the experiment using 8-Puzzle and a photo-realistic Blocksworld environment, we show that (1) the resulting predicates capture the interpretable relations (e.g. spatial), (2) they help obtaining the compact, abstract model of the environment, and finally, (3) the resulting model is compatible to symbolic classical planning.
Learning the true density in high-dimensional feature spaces is a well-known problem in machine learning. In this work, we improve the recent Wasserstein autoencoders (WAEs) by proposing Coulomb autoencoders. We demonstrate that a source of sub-optimality in WAEs is the choice of kernel function, because of the additional local minima in the objective. To mitigate this problem, we propose to use Coulomb kernels. We show that, under some conditions on the capacity of the encoder and the decoder, global convergence in the function space can be achieved. Finally, we provide an upper bound on the generalization performance, which can be improved by increasing the capacity of the encoder and the decoder networks. The theory is corroborated by experimental comparisons on synthetic and real-world datasets against several approaches from the families of generative adversarial networks and autoencoder-based models.
In this paper, we demonstrate a computationally efficient new approach based on deep learning (DL) techniques for analysis, design, and optimization of electromagnetic (EM) nanostructures. We use the strong correlation among features of a generic EM problem to considerably reduce the dimensionality of the problem and thus, the computational complexity, without imposing considerable errors. By employing the dimensionality reduction concept using the more recently demonstrated autoencoder technique, we redefine the conventional many-to-one design problem in EM nanostructures into a one-to-one problem plus a much simpler many-to-one problem, which can be simply solved using an analytic formulation. This approach reduces the computational complexity in solving both the forward problem (i.e., analysis) and the inverse problem (i.e., design) by orders of magnitude compared to conventional approaches. In addition, it provides analytic formulations that, despite their complexity, can be used to obtain intuitive understanding of the physics and dynamics of EM wave interaction with nanostructures with minimal computation requirements. As a proof-of-concept, we applied such an efficacious method to design a new class of on-demand reconfigurable optical metasurfaces based on phase-change materials (PCM). We envision that the integration of such a DL-based technique with full-wave commercial software packages offers a powerful toolkit to facilitate the analysis, design, and optimization of the EM nanostructures as well as explaining, understanding, and predicting the observed responses in such structures.
Human annotation for syntactic parsing is expensive, and large resources are available only for a fraction of languages. A question we ask is whether one can leverage abundant unlabeled texts to improve syntactic parsers, beyond just using the texts to obtain more generalisable lexical features (i.e. beyond word embeddings). To this end, we propose a novel latent-variable generative model for semi-supervised syntactic dependency parsing. As exact inference is intractable, we introduce a differentiable relaxation to obtain approximate samples and compute gradients with respect to the parser parameters. Our method (Differentiable Perturb-and-Parse) relies on differentiable dynamic programming over stochastically perturbed edge scores. We demonstrate effectiveness of our approach with experiments on English, French and Swedish.
Scientific documents rely on both mathematics and text to communicate ideas. Inspired by the topical correspondence between mathematical equations and word contexts observed in scientific texts, we propose a novel topic model that jointly generates mathematical equations and their surrounding text (TopicEq). Using an extension of the correlated topic model, the context is generated from a mixture of latent topics, and the equation is generated by an RNN that depends on the latent topic activations. To experiment with this model, we create a corpus of 400K equation-context pairs extracted from a range of scientific articles from arXiv, and fit the model using a variational autoencoder approach. Experimental results show that this joint model significantly outperforms existing topic models and equation models for scientific texts. Moreover, we qualitatively show that the model effectively captures the relationship between topics and mathematics, enabling novel applications such as topic-aware equation generation, equation topic inference, and topic-aware alignment of mathematical symbols and words.
Sepsis is a life-threatening disease and one of the major causes of death in hospitals. Imaging of microcirculatory dysfunction is a promising approach for automated diagnosis of sepsis. We report a machine learning classifier capable of distinguishing non-septic and septic images from dark field microcirculation videos of patients. The classifier achieves an accuracy of 89.45%. The area under the receiver operating characteristics of the classifier was 0.92, the precision was 0.92 and the recall was 0.84. Codes representing the learned feature space of trained classifier were visualized using t-SNE embedding and were separable and distinguished between images from critically ill and non-septic patients. Using an unsupervised convolutional autoencoder, independent of the clinical diagnosis, we also report clustering of learned features from a compressed representation associated with healthy images and those with microcirculatory dysfunction. The feature space used by our trained classifier to distinguish between images from septic and non-septic patients has potential diagnostic application.
Latent variable models can be used to probabilistically "fill-in" missing data entries. The variational autoencoder architecture (Kingma and Welling, 2014; Rezende et al., 2014) includes a "recognition" or "encoder" network that infers the latent variables given the data variables. However, it is not clear how to handle missing data variables in this network. The factor analysis (FA) model is a basic autoencoder, using linear encoder and decoder networks. We show how to calculate exactly the latent posterior distribution for the factor analysis (FA) model in the presence of missing data, and note that this solution implies that a different encoder network is required for each pattern of missingness. We also discuss various approximations to the exact solution. Experiments compare the effectiveness of various approaches to filling in the missing data.
Deep generative models like variational autoencoders approximate the intrinsic geometry of high dimensional data manifolds by learning low-dimensional latent-space variables and an embedding function. The geometric properties of these latent spaces has been studied under the lens of Riemannian geometry; via analysis of the non-linearity of the generator function. In new developments, deep generative models have been used for learning semantically meaningful `disentangled' representations; that capture task relevant attributes while being invariant to other attributes. In this work, we explore the geometry of popular generative models for disentangled representation learning. We use several metrics to compare the properties of latent spaces of disentangled representation models in terms of class separability and curvature of the latent-space. The results we obtain establish that the class distinguishable features in the disentangled latent space exhibits higher curvature as opposed to a variational autoencoder. We evaluate and compare the geometry of three such models with variational autoencoder on two different datasets. Further, our results show that distances and interpolation in the latent space are significantly improved with Riemannian metrics derived from the curvature of the space. We expect these results will have implications on understanding how deep-networks can be made more robust, generalizable, as well as interpretable.
Motion sensors such as accelerometers and gyroscopes measure the instant acceleration and rotation of a device, in three dimensions. Raw data streams from motion sensors embedded in portable and wearable devices may reveal private information about users without their awareness. For example, motion data might disclose the weight or gender of a user, or enable their re-identification. To address this problem, we propose an on-device transformation of sensor data to be shared for specific applications, such as monitoring selected daily activities, without revealing information that enables user identification. We formulate the anonymization problem using an information-theoretic approach and propose a new multi-objective loss function for training deep autoencoders. This loss function helps minimizing user-identity information as well as data distortion to preserve the application-specific utility. The training process regulates the encoder to disregard user-identifiable patterns and tunes the decoder to shape the output independently of users in the training set. The trained autoencoder can be deployed on a mobile or wearable device to anonymize sensor data even for users who are not included in the training dataset. Data from 24 users transformed by the proposed anonymizing autoencoder lead to a promising trade-off between utility and privacy, with an accuracy for activity recognition above 92% and an accuracy for user identification below 7%.
Aspect-based Opinion Summary (AOS), consisting of aspect discovery and sentiment classification steps, has recently been emerging as one of the most crucial data mining tasks in e-commerce systems. Along this direction, the LDA-based model is considered as a notably suitable approach, since this model offers both topic modeling and sentiment classification. However, unlike traditional topic modeling, in the context of aspect discovery it is often required some initial seed words, whose prior knowledge is not easy to be incorporated into LDA models. Moreover, LDA approaches rely on sampling methods, which need to load the whole corpus into memory, making them hardly scalable. In this research, we study an alternative approach for AOS problem, based on Autoencoding Variational Inference (AVI). Firstly, we introduce the Autoencoding Variational Inference for Aspect Discovery (AVIAD) model, which extends the previous work of Autoencoding Variational Inference for Topic Models (AVITM) to embed prior knowledge of seed words. This work includes enhancement of the previous AVI architecture and also modification of the loss function. Ultimately, we present the Autoencoding Variational Inference for Joint Sentiment/Topic (AVIJST) model. In this model, we substantially extend the AVI model to support the JST model, which performs topic modeling for corresponding sentiment. The experimental results show that our proposed models enjoy higher topic coherent, faster convergence time and better accuracy on sentiment classification, as compared to their LDA-based counterparts.
We provide a series of results for unsupervised learning with autoencoders. Specifically, we study shallow two-layer autoencoder architectures with shared weights. We focus on three generative models for data that are common in statistical machine learning: (i) the mixture-of-gaussians model, (ii) the sparse coding model, and (iii) the sparsity model with non-negative coefficients. For each of these models, we prove that under suitable choices of hyperparameters, architectures, and initialization, autoencoders learned by gradient descent can successfully recover the parameters of the corresponding model. To our knowledge, this is the first result that rigorously studies the dynamics of gradient descent for weight-sharing autoencoders. Our analysis can be viewed as theoretical evidence that shallow autoencoder modules indeed can be used as feature learning mechanisms for a variety of data models, and may shed insight on how to train larger stacked architectures with autoencoders as basic building blocks.
Regularized autoencoders learn the latent codes, a structure with the regularization under the distribution, which enables them the capability to infer the latent codes given observations and generate new samples given the codes. However, they are sometimes ambiguous as they tend to produce reconstructions that are not necessarily faithful reproduction of the inputs. The main reason is to enforce the learned latent code distribution to match a prior distribution while the true distribution remains unknown. To improve the reconstruction quality and learn the latent space a manifold structure, this work present a novel approach using the adversarially approximated autoencoder (AAAE) to investigate the latent codes with adversarial approximation. Instead of regularizing the latent codes by penalizing on the distance between the distributions of the model and the target, AAAE learns the autoencoder flexibly and approximates the latent space with a simpler generator. The ratio is estimated using generative adversarial network (GAN) to enforce the similarity of the distributions. Additionally, the image space is regularized with an additional adversarial regularizer. The proposed approach unifies two deep generative models for both latent space inference and diverse generation. The learning scheme is realized without regularization on the latent codes, which also encourages faithful reconstruction. Extensive validation experiments on four real-world datasets demonstrate the superior performance of AAAE. In comparison to the state-of-the-art approaches, AAAE generates samples with better quality and shares the properties of regularized autoencoder with a nice latent manifold structure.