Unsteady fluid systems are nonlinear high-dimensional dynamical systems that may exhibit multiple complex phenomena both in time and space. Reduced Order Modeling (ROM) of fluid flows has been an active research topic in the recent decade with the primary goal to decompose complex flows to a set of features most important for future state prediction and control, typically using a dimensionality reduction technique. In this work, a novel data-driven technique based on the power of deep neural networks for reduced order modeling of the unsteady fluid flows is introduced. An autoencoder network is used for nonlinear dimension reduction and feature extraction as an alternative for singular value decomposition (SVD). Then, the extracted features are used as an input for long short-term memory network (LSTM) to predict the velocity field at future time instances. The proposed autoencoder-LSTM method is compared with dynamic mode decomposition (DMD) as the data-driven base method. Moreover, an autoencoder-DMD algorithm is introduced for reduced order modeling, which uses the autoencoder network for dimensionality reduction rather than SVD rank truncation. Results show that the autoencoder-LSTM method is considerably capable of predicting the fluid flow evolution, where higher values for coefficient of determination $R^{2}$ are obtained using autoencoder-LSTM comparing to DMD.

0
0
下载
预览

The statistical and computational performance of sparse principal component analysis (PCA) can be dramatically improved when the principal components are allowed to be sparse in a rotated eigenbasis. For this, we propose a new method for sparse PCA. In the simplest version of the algorithm, the component scores and loadings are initialized with a low-rank singular value decomposition. Then, the singular vectors are rotated with orthogonal rotations to make them approximately sparse. Finally, soft-thresholding is applied to the rotated singular vectors. This approach differs from prior approaches because it uses an orthogonal rotation to approximate a sparse basis. Our sparse PCA framework is versatile; for example, it extends naturally to the two-way analysis of a data matrix for simultaneous dimensionality reduction of rows and columns. We identify the close relationship between sparse PCA and independent component analysis for separating sparse signals. We provide empirical evidence showing that for the same level of sparsity, the proposed sparse PCA method is more stable and can explain more variance compared to alternative methods. Through three applications---sparse coding of images, analysis of transcriptome sequencing data, and large-scale clustering of Twitter accounts, we demonstrate the usefulness of sparse PCA in exploring modern multivariate data.

0
0
下载
预览

We design a physics-aware auto-encoder to specifically reduce the dimensionality of solutions arising from convection-dominated nonlinear physical systems. Although existing nonlinear manifold learning methods seem to be compelling tools to reduce the dimensionality of data characterized by a large Kolmogorov n-width, they typically lack a straightforward mapping from the latent space to the high-dimensional physical space. Moreover, the realized latent variables are often hard to interpret. Therefore, many of these methods are often dismissed in the reduced order modeling of dynamical systems governed by the partial differential equations (PDEs). Accordingly, we propose an auto-encoder type nonlinear dimensionality reduction algorithm. The unsupervised learning problem trains a diffeomorphic spatio-temporal grid, that registers the output sequence of the PDEs on a non-uniform parameter/time-varying grid, such that the Kolmogorov n-width of the mapped data on the learned grid is minimized. We demonstrate the efficacy and interpretability of our approach to separate convection/advection from diffusion/scaling on various manufactured and physical systems.

0
0
下载
预览

Dimensionality reduction is a classical technique widely used for data analysis. One foundational instantiation is Principal Component Analysis (PCA), which minimizes the average reconstruction error. In this paper, we introduce the "multi-criteria dimensionality reduction" problem where we are given multiple objectives that need to be optimized simultaneously. As an application, our model captures several fairness criteria for dimensionality reduction such as our novel Fair-PCA problem and the Nash Social Welfare (NSW) problem. In Fair-PCA, the input data is divided into $k$ groups, and the goal is to find a single $d$-dimensional representation for all groups for which the minimum variance of any one group is maximized. In NSW, the goal is to maximize the product of the individual variances of the groups achieved by the common low-dimensional space. Our main result is an exact polynomial-time algorithm for the two-criterion dimensionality reduction problem when the two criteria are increasing concave functions. As an application of this result, we obtain a polynomial time algorithm for Fair-PCA for $k=2$ groups and a polynomial time algorithm for NSW objective for $k=2$ groups. We also give approximation algorithms for $k>2$. Our technical contribution in the above results is to prove new low-rank properties of extreme point solutions to semi-definite programs. We conclude with experiments indicating the effectiveness of algorithms based on extreme point solutions of semi-definite programs on several real-world data sets.

0
0
下载
预览

A common approach to analyse high-dimensional data is to learn a low-dimensional representation using generative modelling techniques. Clinical patient records are an example of high-dimensional data that is typically collected from disparate sources and comprises of multiple likelihoods with noisy as well as missing values. In this work, we propose an unsupervised generative model that can identify clustering among the observations in a latent space while making use of all available data in a heterogeneous data setting with noisy and missing values. We improve upon the existing Gaussian process latent variable model (GPLVM) by incorporating multiple likelihoods and deep neural network parameterised back-constraints to create a non-linear dimensionality reduction technique for heterogeneous data. In addition, we develop a variational inference method for our model that uses numerical quadrature. We establish the effectiveness of our model and compare against existing GPLVM methods on clinical data of Parkinson's disease patients treated at the HUS Helsinki University Hospital. Furthermore, we demonstrate sub-groups from the heterogeneous patient data, evaluate the robustness of the findings, and interpret cluster characteristics.

0
0
下载
预览

Visual place recognition is challenging because there are so many factors that can cause the appearance of a place to change, from day-night cycles to seasonal change to atmospheric conditions. In recent years a large range of approaches have been developed to address this challenge including deep-learnt image descriptors, domain translation, and sequential filtering, all with shortcomings including generality and velocity-sensitivity. In this paper we propose a novel descriptor derived from tracking changes in any learned global descriptor over time, dubbed Delta Descriptors. Delta Descriptors mitigate the offsets induced in the original descriptor matching space in an unsupervised manner by considering temporal differences across places observed along a route. Like all other approaches, Delta Descriptors have a shortcoming - volatility on a frame to frame basis - which can be overcome by combining them with sequential filtering methods. Using two benchmark datasets, we first demonstrate the high performance of Delta Descriptors in isolation, before showing new state-of-the-art performance when combined with sequence-based matching. We also present results demonstrating the approach working with a second different underlying descriptor type, and two other beneficial properties of Delta Descriptors in comparison to existing techniques: their increased inherent robustness to variations in camera motion and a reduced rate of performance degradation as dimensional reduction is applied. Source code will be released upon publication.

0
0
下载
预览

Classification of multivariate time series (MTS) has been tackled with a large variety of methodologies and applied to a wide range of scenarios. Reservoir Computing (RC) provides efficient tools to generate a vectorial, fixed-size representation of the MTS that can be further processed by standard classifiers. Despite their unrivaled training speed, MTS classifiers based on a standard RC architecture fail to achieve the same accuracy of fully trainable neural networks. In this paper we introduce the reservoir model space, an unsupervised approach based on RC to learn vectorial representations of MTS. Each MTS is encoded within the parameters of a linear model trained to predict a low-dimensional embedding of the reservoir dynamics. Compared to other RC methods, our model space yields better representations and attains comparable computational performance, thanks to an intermediate dimensionality reduction procedure. As a second contribution we propose a modular RC framework for MTS classification, with an associated open-source Python library. The framework provides different modules to seamlessly implement advanced RC architectures. The architectures are compared to other MTS classifiers, including deep learning models and time series kernels. Results obtained on benchmark and real-world MTS datasets show that RC classifiers are dramatically faster and, when implemented using our proposed representation, also achieve superior classification accuracy.

0
0
下载
预览

In sentiment classification, the enormous amount of textual data, its immense dimensionality, and inherent noise make it extremely difficult for machine learning classifiers to extract high-level and complex abstractions. In order to make the data less sparse and more statistically significant, the dimensionality reduction techniques are needed. But in the existing dimensionality reduction techniques, the number of components needs to be set manually which results in loss of the most prominent features, thus reducing the performance of the classifiers. Our prior work, i.e., Term Presence Count (TPC) and Term Presence Ratio (TPR) have proven to be effective techniques as they reject the less separable features. However, the most prominent and separable features might still get removed from the initial feature set despite having higher distributions among positive and negative tagged documents. To overcome this problem, we have proposed a new framework that consists of two-dimensionality reduction techniques i.e., Sentiment Term Presence Count (SentiTPC) and Sentiment Term Presence Ratio (SentiTPR). These techniques reject the features by considering term presence difference for SentiTPC and ratio of the distribution distinction for SentiTPR. Additionally, these methods also analyze the total distribution information. Extensive experimental results exhibit that the proposed framework reduces the feature dimension by a large scale, and thus significantly improve the classification performance.

0
0
下载
预览

Experimental measurements of physical systems often have a limited number of independent channels, causing essential dynamical variables to remain unobserved. However, many popular methods for unsupervised inference of latent dynamics from experimental data implicitly assume that the measurements have higher intrinsic dimensionality than the underlying system---making coordinate identification a dimensionality reduction problem. Here, we study the opposite limit, in which hidden governing coordinates must be inferred from only a low-dimensional time series of measurements. Inspired by classical techniques for studying the strange attractors of chaotic systems, we introduce a general embedding technique for time series, consisting of an autoencoder trained with a novel latent-space loss function. We show that our technique reconstructs the strange attractors of synthetic and real-world systems better than existing techniques, and that it creates consistent, predictive representations of even stochastic systems. We conclude by using our technique to discover dynamical attractors in diverse systems such as patient electrocardiograms, household electricity usage, and eruptions of the Old Faithful geyser---demonstrating diverse applications of our technique for exploratory data analysis.

0
0
下载
预览

The Semantic Web is an interoperable ecosystem where data producers, such as libraries, public institutions, communities, and companies, publish and link heterogeneous resources. To support this heterogeneity, its format, RDF, allows to describe collections of items sharing some attributes but not necessarily all of them. This flexible framework leads to incompleteness and inconsistencies in information representation, which in turn leads to unreliable query results. In order to make their data reliable and usable, Linked Data producers need to provide the best level of completeness. We propose a novel visualization tool "The Missing Path" to support data producers in diagnosing incompleteness in their data. It relies on dimensional reduction techniques to create a map of RDF entities based on missing paths, revealing clusters of entities missing the same paths. The novelty of our work consists in describing the entities of interest as vectors of aggregated RDF paths of a fixed length. We show that identifying groups of items sharing a similar structure helps users find the cause of incompleteness for entire groups and allows them to decide if and how it has to be resolved. We describe our iterative design process and evaluation with Wikidata contributors.

0
0
下载
预览

A novel unsupervised outlier score, which can be embedded into graph based dimensionality reduction techniques, is presented in this work. The score uses the directed nearest neighbor graphs of those techniques. Hence, the same measure of similarity that is used to project the data into lower dimensions, is also utilized to determine the outlier score. The outlier score is realized through a weighted normalized entropy of the similarities. This score is applied to road infrastructure images. The aim is to identify newly observed infrastructures given a pre-collected base dataset. Detecting unknown scenarios is a key for accelerated validation of autonomous vehicles. The results show the high potential of the proposed technique. To validate the generalization capabilities of the outlier score, it is additionally applied to various real world datasets. The overall average performance in identifying outliers using the proposed methods is higher compared to state-of-the-art methods. In order to generate the infrastructure images, an openDRIVE parsing and plotting tool for Matlab is developed as part of this work. This tool and the implementation of the entropy based outlier score in combination with Uniform Manifold Approximation and Projection are made publicly available.

0
0
下载
预览
Top