人类眼中的世界是时刻变化的，特别是不同“域”（domain）之间的状态、事物、人员等。这里的“域”指的是某一刻的世界状态。由于对两个时刻不同知识的协作处理需求，因此产生了域迁移自适应（domain transfer adaptation）这一领域。传统的机器学习方法，主要目标是通过对训练数据进行经验风险最小化学习，找出令测试数据期望风险最小化的模型，并在这一过程中假设，训练数据与测试数据共享了相似的联合概率分布。迁移自适应学习目标是构建一个可以处理来自不同概率空间的“源域”与“目标域”的模型，这是一个非常有前景的领域，并逐步吸引了更多研究者。
Instance Re-weighting Adaptation。由于跨域数据间的概率分布差异，通过基于源和目标数据之间的特征分布匹配，以非参方式直接推断实例的重采样权重来自然考虑差异，参数分布假设下的权重参数估计仍然是一大挑战。
本文分别针对以上5个挑战，从多个维度进行了梳理，包括：弱监督学习观点、Instance Re-weighting Adaptation、特征适应、分类器适应、深度网络适应、对抗适应、基准数据集等内容。
We develop a system for modeling hand-object interactions in 3D from RGB images that show a hand which is holding a novel object from a known category. We design a Convolutional Neural Network (CNN) for Hand-held Object Pose and Shape estimation called HOPS-Net and utilize prior work to estimate the hand pose and configuration. We leverage the insight that information about the hand facilitates object pose and shape estimation by incorporating the hand into both training and inference of the object pose and shape as well as the refinement of the estimated pose. The network is trained on a large synthetic dataset of objects in interaction with a human hand. To bridge the gap between real and synthetic images, we employ an image-to-image translation model (Augmented CycleGAN) that generates realistically textured objects given a synthetic rendering. This provides a scalable way of generating annotated data for training HOPS-Net. Our quantitative experiments show that even noisy hand parameters significantly help object pose and shape estimation. The qualitative experiments show results of pose and shape estimation of objects held by a hand "in the wild".
This paper studies the problems of vehicle make & model classification. Some of the main challenges are reaching high classification accuracy and reducing the annotation time of the images. To address these problems, we have created a fine-grained database using online vehicle marketplaces of Turkey. A pipeline is proposed to combine an SSD (Single Shot Multibox Detector) model with a CNN (Convolutional Neural Network) model to train on the database. In the pipeline, we first detect the vehicles by following an algorithm which reduces the time for annotation. Then, we feed them into the CNN model. It is reached approximately 4% better classification accuracy result than using a conventional CNN model. Next, we propose to use the detected vehicles as ground truth bounding box (GTBB) of the images and feed them into an SSD model in another pipeline. At this stage, it is reached reasonable classification accuracy result without using perfectly shaped GTBB. Lastly, an application is implemented in a use case by using our proposed pipelines. It detects the unauthorized vehicles by comparing their license plate numbers and make & models. It is assumed that license plates are readable.
We propose a person detector on omnidirectional images, an accurate method to generate minimal enclosing rectangles of persons. The basic idea is to adapt the qualitative detection performance of a convolutional neural network based method, namely YOLOv2 to fish-eye images. The design of our approach picks up the idea of a state-of-the-art object detector and highly overlapping areas of images with their regions of interests. This overlap reduces the number of false negatives. Based on the raw bounding boxes of the detector we fine-tuned overlapping bounding boxes by three approaches: the non-maximum suppression, the soft non-maximum suppression and the soft non-maximum suppression with Gaussian smoothing. The evaluation was done on the PIROPO database and an own annotated Flat dataset, supplemented with bounding boxes on omnidirectional images. We achieve an average precision of 64.4 % with YOLOv2 for the class person on PIROPO and 77.6 % on Flat. For this purpose we fine-tuned the soft non-maximum suppression with Gaussian smoothing.