Finding a person across a camera network plays an important role in video surveillance. For a real-world person re-identification application, in order to guarantee an optimal time response, it is crucial to find the balance between accuracy and speed. We analyse this trade-off, comparing a classical method, that comprises hand-crafted feature description and metric learning, in particular, LOMO and XQDA, to deep learning based techniques, using image classification networks, ResNet and MobileNets. Additionally, we propose and analyse network distillation as a learning strategy to reduce the computational cost of the deep learning approach at test time. We evaluate both methods on the Market-1501 and DukeMTMC-reID large-scale datasets, showing that distillation helps reducing the computational cost at inference time while even increasing the accuracy performance.
Deep metric learning employs deep neural networks to embed instances into a metric space such that distances between instances of the same class are small and distances between instances from different classes are large. In most existing deep metric learning techniques, the embedding of an instance is given by a feature vector produced by a deep neural network and Euclidean distance or cosine similarity defines distances between these vectors. In this paper, we study deep distributional embeddings of sequences, where the embedding of a sequence is given by the distribution of learned deep features across the sequence. This has the advantage of capturing statistical information about the distribution of patterns within the sequence in the embedding. When embeddings are distributions rather than vectors, measuring distances between embeddings involves comparing their respective distributions. We propose a distance metric based on Wasserstein distances between the distributions and a corresponding loss function for metric learning, which leads to a novel end-to-end trainable embedding model. We empirically observe that distributional embeddings outperform standard vector embeddings and that training with the proposed Wasserstein metric outperforms training with other distance functions.
This paper presents a hardness-aware deep metric learning (HDML) framework. Most previous deep metric learning methods employ the hard negative mining strategy to alleviate the lack of informative samples for training. However, this mining strategy only utilizes a subset of training data, which may not be enough to characterize the global geometry of the embedding space comprehensively. To address this problem, we perform linear interpolation on embeddings to adaptively manipulate their hard levels and generate corresponding label-preserving synthetics for recycled training, so that information buried in all samples can be fully exploited and the metric is always challenged with proper difficulty. Our method achieves very competitive performance on the widely used CUB-200-2011, Cars196, and Stanford Online Products datasets.
Increasing numbers of MRI brain scans, improvements in image resolution, and advancements in MRI acquisition technology are causing significant increases in the demand for and burden on radiologists' efforts in terms of reading and interpreting brain MRIs. Content-based image retrieval (CBIR) is an emerging technology for reducing this burden by supporting the reading of medical images. High dimensionality is a major challenge in developing a CBIR system that is applicable for 3D brain MRIs. In this study, we propose a system called disease-oriented data concentration with metric learning (DDCML). In DDCML, we introduce deep metric learning to a 3D convolutional autoencoder (CAE). Our proposed DDCML scheme achieves a high dimensional compression rate (4096:1) while preserving the disease-related anatomical features that are important for medical image classification. The low-dimensional representation obtained by DDCML improved the clustering performance by 29.1\% compared to plain 3D-CAE in terms of discriminating Alzheimer's disease patients from healthy subjects, and successfully reproduced the relationships of the severity of disease categories that were not included in the training.
Deep metric learning aims to learn a deep embedding that can capture the semantic similarity of data points. Given the availability of massive training samples, deep metric learning is known to suffer from slow convergence due to a large fraction of trivial samples. Therefore, most existing methods generally resort to sample mining strategies for selecting nontrivial samples to accelerate convergence and improve performance. In this work, we identify two critical limitations of the sample mining methods, and provide solutions for both of them. First, previous mining methods assign one binary score to each sample, i.e., dropping or keeping it, so they only selects a subset of relevant samples in a mini-batch. Therefore, we propose a novel sample mining method, called Online Soft Mining (OSM), which assigns one continuous score to each sample to make use of all samples in the mini-batch. OSM learns extended manifolds that preserve useful intraclass variances by focusing on more similar positives. Second, the existing methods are easily influenced by outliers as they are generally included in the mined subset. To address this, we introduce Class-Aware Attention (CAA) that assigns little attention to abnormal data samples. Furthermore, by combining OSM and CAA, we propose a novel weighted contrastive loss to learn discriminative embeddings. Extensive experiments on two fine-grained visual categorisation datasets and two video-based person re-identification benchmarks show that our method significantly outperforms the state-of-the-art.
In this work, a discriminatively learned CNN embedding is proposed for remote sensing image scene classification. Our proposed siamese network simultaneously computes the classification loss function and the metric learning loss function of the two input images. Specifically, for the classification loss, we use the standard cross-entropy loss function to predict the classes of the images. For the metric learning loss, our siamese network learns to map the intra-class and inter-class input pairs to a feature space where intra-class inputs are close and inter-class inputs are separated by a margin. Concretely, for remote sensing image scene classification, we would like to map images from the same scene to feature vectors that are close, and map images from different scenes to feature vectors that are widely separated. Experiments are conducted on three different remote sensing image datasets to evaluate the effectiveness of our proposed approach. The results demonstrate that the proposed method achieves an excellent classification performance.