Dimensionality reduction methods, also known as projections, are frequently used for exploring multidimensional data in machine learning, data science, and information visualization. Among these, t-SNE and its variants have become very popular for their ability to visually separate distinct data clusters. However, such methods are computationally expensive for large datasets, suffer from stability problems, and cannot directly handle out-of-sample data. We propose a learning approach to construct such projections. We train a deep neural network based on a collection of samples from a given data universe, and their corresponding projections, and next use the network to infer projections of data from the same, or similar, universes. Our approach generates projections with similar characteristics as the learned ones, is computationally two to three orders of magnitude faster than SNE-class methods, has no complex-to-set user parameters, handles out-of-sample data in a stable manner, and can be used to learn any projection technique. We demonstrate our proposal on several real-world high dimensional datasets from machine learning.

    点赞 0
    阅读0+

    In this paper, we demonstrate a computationally efficient new approach based on deep learning (DL) techniques for analysis, design, and optimization of electromagnetic (EM) nanostructures. We use the strong correlation among features of a generic EM problem to considerably reduce the dimensionality of the problem and thus, the computational complexity, without imposing considerable errors. By employing the dimensionality reduction concept using the more recently demonstrated autoencoder technique, we redefine the conventional many-to-one design problem in EM nanostructures into a one-to-one problem plus a much simpler many-to-one problem, which can be simply solved using an analytic formulation. This approach reduces the computational complexity in solving both the forward problem (i.e., analysis) and the inverse problem (i.e., design) by orders of magnitude compared to conventional approaches. In addition, it provides analytic formulations that, despite their complexity, can be used to obtain intuitive understanding of the physics and dynamics of EM wave interaction with nanostructures with minimal computation requirements. As a proof-of-concept, we applied such an efficacious method to design a new class of on-demand reconfigurable optical metasurfaces based on phase-change materials (PCM). We envision that the integration of such a DL-based technique with full-wave commercial software packages offers a powerful toolkit to facilitate the analysis, design, and optimization of the EM nanostructures as well as explaining, understanding, and predicting the observed responses in such structures.

    点赞 0
    阅读0+

    Noisy labeled data represent a rich source of information that often are easily accessible and cheap to obtain, but label noise might also have many negative consequences if not accounted for. How to fully utilize noisy labels has been studied extensively within the framework of standard supervised machine learning over a period of several decades. However, very little research has been conducted on solving the challenge posed by noisy labels in non-standard settings. This includes situations where only a fraction of the samples are labeled (semi-supervised) and each high-dimensional sample is associated with multiple labels. In this work, we present a novel semi-supervised and multi-label dimensionality reduction method that effectively utilizes information from both noisy multi-labels and unlabeled data. With the proposed Noisy multi-label semi-supervised dimensionality reduction (NMLSDR) method, the noisy multi-labels are denoised and unlabeled data are labeled simultaneously via a specially designed label propagation algorithm. NMLSDR then learns a projection matrix for reducing the dimensionality by maximizing the dependence between the enlarged and denoised multi-label space and the features in the projected space. Extensive experiments on synthetic data, benchmark datasets, as well as a real-world case study, demonstrate the effectiveness of the proposed algorithm and show that it outperforms state-of-the-art multi-label feature extraction algorithms.

    点赞 0
    阅读0+

    The kernel matrix used in kernel methods encodes all the information required for solving complex nonlinear problems defined on data representations in the input space using simple, but implicitly defined, solutions. Spectral analysis on the kernel matrix defines an explicit nonlinear mapping of the input data representations to a subspace of the kernel space, which can be used for directly applying linear methods. However, the selection of the kernel subspace is crucial for the performance of the proceeding processing steps. In this paper, we propose a component analysis method for kernel-based dimensionality reduction that optimally preserves the pair-wise distances of the class means in the feature space. We provide extensive analysis on the connection of the proposed criterion to those used in kernel principal component analysis and kernel discriminant analysis, leading to a discriminant analysis version of the proposed method. Our analysis also provides more insights on the properties of the feature spaces obtained by applying these methods.

    点赞 0
    阅读0+

    In recent years several novel models were developed to process natural language, development of accurate language translation systems have helped us overcome geographical barriers and communicate ideas effectively. These models are developed mostly for a few languages that are widely used while other languages are ignored. Most of the languages that are spoken share lexical, syntactic and sematic similarity with several other languages and knowing this can help us leverage the existing model to build more specific and accurate models that can be used for other languages, so here I have explored the idea of representing several known popular languages in a lower dimension such that their similarities can be visualized using simple 2 dimensional plots. This can even help us understand newly discovered languages that may not share its vocabulary with any of the existing languages.

    点赞 0
    阅读0+

    We perform unsupervised analysis of image-derived shape and motion features extracted from 3822 cardiac 4D MRIs of the UK Biobank. First, with a feature extraction method previously published based on deep learning models, we extract from each case 9 feature values characterizing both the cardiac shape and motion. Second, a feature selection is performed to remove highly correlated feature pairs. Third, clustering is carried out using a Gaussian mixture model on the selected features. After analysis, we identify two small clusters which probably correspond to two pathological categories. Further confirmation using a trained classification model and dimensionality reduction tools is carried out to support this discovery. Moreover, we examine the differences between the other large clusters and compare our measures with the ground-truth.

    点赞 0
    阅读0+

    In this paper, we demonstrate a computationally efficient new approach based on deep learning (DL) techniques for analysis, design, and optimization of electromagnetic (EM) nanostructures. We use the strong correlation among features of a generic EM problem to considerably reduce the dimensionality of the problem and thus, the computational complexity, without imposing considerable errors. By employing the dimensionality reduction concept using the more recently demonstrated autoencoder technique, we redefine the conventional many-to-one design problem in EM nanostructures into a one-to-one problem plus a much simpler many-to-one problem, which can be simply solved using an analytic formulation. This approach reduces the computational complexity in solving both the forward problem (i.e., analysis) and the inverse problem (i.e., design) by orders of magnitude compared to conventional approaches. In addition, it provides analytic formulations that, despite their complexity, can be used to obtain intuitive understanding of the physics and dynamics of EM wave interaction with nanostructures with minimal computation requirements. As a proof-of-concept, we applied such an efficacious method to design a new class of on-demand reconfigurable optical metasurfaces based on phase-change materials (PCM). We envision that the integration of such a DL-based technique with full-wave commercial software packages offers a powerful toolkit to facilitate the analysis, design, and optimization of the EM nanostructures as well as explaining, understanding, and predicting the observed responses in such structures.

    点赞 0
    阅读0+

    Cyber security threats have been growing significantly in both volume and sophistication over the past decade. This poses great challenges to malware detection without considerable automation. In this paper, we have proposed a novel approach by extending our recently suggested artificial neural network (ANN) based model with feature selection using the principal component analysis (PCA) technique for malware detection. The effectiveness of the approach has been successfully demonstrated with the application in PDF malware detection. A varying number of principal components is examined in the comparative study. Our evaluation shows that the model with PCA can significantly reduce feature redundancy and learning time with minimum impact on data information loss, as confirmed by both training and testing results based on around 105,000 real-world PDF documents. Of the evaluated models using PCA, the model with 32 principal feature components exhibits very similar training accuracy to the model using the 48 original features, resulting in around 33% dimensionality reduction and 22% less learning time. The testing results further confirm the effectiveness and show that the model is able to achieve 93.17% true positive rate (TPR) while maintaining the same low false positive rate (FPR) of 0.08% as the case when no feature selection is applied, which significantly outperforms all evaluated seven well known commercial antivirus (AV) scanners of which the best scanner only has a TPR of 84.53%.

    点赞 0
    阅读0+

    License plate recognition is the key component to many automatic traffic control systems. It enables the automatic identification of vehicles in many applications. Such systems must be able to identify vehicles from images taken in various conditions including low light, rain, snow, etc. In order to reduce the complexity and cost of the hardware required for such devices, the algorithm should be as efficient as possible. This paper proposes a license plate recognition system which uses a new approach based on compressive sensing techniques for dimensionality reduction and feature extraction. Dimensionality reduction will enable precise classification with less training data while demanding less computational power. Based on the extracted features, character recognition and classification is done by a Support Vector Machine classifier.

    点赞 0
    阅读0+

    Dimensionality reduction is a main step in the learning process which plays an essential role in many applications. The most popular methods in this field like SVD, PCA, and LDA, only can be applied to data with vector format. This means that for higher order data like matrices or more generally tensors, data should be fold to the vector format. So, in this approach, the spatial relations of features are not considered and also the probability of over-fitting is increased. Due to these issues, in recent years some methods like Generalized low-rank approximation of matrices (GLRAM) and Multilinear PCA (MPCA) are proposed which deal with the data in their own format. So, in these methods, the spatial relationships of features are preserved and the probability of overfitting could be fallen. Also, their time and space complexities are less than vector-based ones. However, because of the fewer parameters, the search space in multilinear approach is much smaller than the search space of the vector-based approach. To overcome this drawback of multilinear methods like GLRAM, we proposed a new method which is a general form of GLRAM and by preserving the merits of it have a larger search space. Experimental results confirm the quality of the proposed method. Also, applying this approach to the other multilinear dimensionality reduction methods like MPCA and MLDA is straightforward.

    点赞 0
    阅读0+

    Sketching refers to a class of randomized dimensionality reduction methods that aim to preserve relevant information in large-scale datasets. They have efficient memory requirements and typically require just a single pass over the dataset. Efficient sketching methods have been derived for vector and matrix-valued datasets. When the datasets are higher-order tensors, a naive approach is to flatten the tensors into vectors or matrices and then sketch them. However, this is inefficient since it ignores the multi-dimensional nature of tensors. In this paper, we propose a novel multi-dimensional tensor sketch (MTS) that preserves higher order data structures while reducing dimensionality. We build this as an extension to the popular count sketch (CS) and show that it yields an unbiased estimator of the original tensor. We demonstrate significant advantages in compression ratios when the original data has decomposable tensor representations such as the Tucker, CP, tensor train or Kronecker product forms. We apply MTS to tensorized neural networks where we replace fully connected layers with tensor operations. We achieve nearly state of art accuracy with significant compression on image classification benchmarks.

    点赞 0
    阅读0+

    Non-negative matrix factorization (NMF) is a dimensionality reduction technique which tends to produce a sparse representation of data. Commonly, the error between the actual and recreated matrices is used as an objective function, but this method may not produce the type of representation we desire as it allows for the complexity of the model to grow, constrained only by the size of the subspace and the non-negativity requirement. If additional constraints, such as sparsity, are imposed the question of parameter selection becomes critical. Instead of adding sparsity constraints in an ad-hoc manner we propose a novel objective function created by using the principle of minimum description length (MDL). Our formulation, MDL-NMF, automatically trades off between the complexity and accuracy of the model using a principled approach with little parameter selection or the need for domain expertise. We demonstrate our model works effectively on three heterogeneous data-sets and on a range of semi-synthetic data showing the broad applicability of our method.

    点赞 0
    阅读0+

    Visual localization has become a key enabling component of many place recognition and SLAM systems. Contemporary research has primarily focused on improving accuracy and precision-recall type metrics, with relatively little attention paid to a system's absolute storage scaling characteristics, its flexibility to adapt to available computational resources, and its longevity with respect to easily incorporating newly learned or hand-crafted image descriptors. Most significantly, improvement in one of these aspects typically comes at the cost of others: for example, a snapshot-based system that achieves sub-linear storage cost typically provides no metric pose estimation, or, a highly accurate pose estimation technique is often ossified in adapting to recent advances in appearance-invariant features. In this paper, we present a novel 6-DOF localization system that for the first time simultaneously achieves all the three characteristics: significantly sub-linear storage growth, agnosticism to image descriptors, and customizability to available storage and computational resources. The key features of our method are developed based on a novel adaptation of multiple-label learning, together with effective dimensional reduction and learning techniques that enable simple and efficient optimization. We evaluate our system on several large benchmarking datasets and provide detailed comparisons to state-of-the-art systems. The proposed method demonstrates competitive accuracy with existing pose estimation methods while achieving better sub-linear storage scaling, significantly reduced absolute storage requirements, and faster training and deployment speeds.

    点赞 0
    阅读0+

    Sketching refers to a class of randomized dimensionality reduction methods that aim to preserve relevant information in large-scale datasets. They have efficient memory requirements and typically require just a single pass over the dataset. Efficient sketching methods have been derived for vector and matrix-valued datasets. When the datasets are higher-order tensors, a naive approach is to flatten the tensors into vectors or matrices and then sketch them. However, this is inefficient since it ignores the multi-dimensional nature of tensors. In this paper, we propose a novel multi-dimensional tensor sketch (MTS) that preserves higher order data structures while reducing dimensionality. We build this as an extension to the popular count sketch (CS) and show that it yields an unbiased estimator of the original tensor. We demonstrate significant advantages in compression ratios when the original data has decomposable tensor representations such as the Tucker, CP, tensor train or Kronecker product forms. We apply MTS to tensorized neural networks where we replace fully connected layers with tensor operations. We achieve nearly state of art accuracy with significant compression on image classification benchmarks.

    点赞 0
    阅读0+
Top