As facial interaction systems are prevalently deployed, security and reliability of these systems become a critical issue, with substantial research efforts devoted. Among them, face anti-spoofing emerges as an important area, whose objective is to identify whether a presented face is live or spoof. Recently, a large-scale face anti-spoofing dataset, CelebA-Spoof which comprised of 625,537 pictures of 10,177 subjects has been released. It is the largest face anti-spoofing dataset in terms of the numbers of the data and the subjects. This paper reports methods and results in the CelebA-Spoof Challenge 2020 on Face AntiSpoofing which employs the CelebA-Spoof dataset. The model evaluation is conducted online on the hidden test set. A total of 134 participants registered for the competition, and 19 teams made valid submissions. We will analyze the top ranked solutions and present some discussion on future work directions.
Recent advancements in Artificial intelligence, especially deep learning, has changed many fields irreversibly by introducing state of the art methods for automation. Construction monitoring has not been an exception; as a part of construction monitoring systems, material classification and recognition have drawn the attention of deep learning and machine vision researchers. However, to create production-ready systems, there is still a long path to cover. Real-world problems such as varying illuminations and reaching acceptable accuracies need to be addressed in order to create robust systems. In this paper, we have addressed these issues and reached a state of the art performance, i.e., 97.35% accuracy rate for this task. Also, a new dataset containing 1231 images of 11 classes taken from several construction sites is gathered and publicly published to help other researchers in this field.
Explicating implicit reasoning (i.e. warrants) in arguments is a long-standing challenge for natural language understanding systems. While recent approaches have focused on explicating warrants via crowdsourcing or expert annotations, the quality of warrants has been questionable due to the extreme complexity and subjectivity of the task. In this paper, we tackle the complex task of warrant explication and devise various methodologies for collecting warrants. We conduct an extensive study with trained experts to evaluate the resulting warrants of each methodology and find that our methodologies allow for high-quality warrants to be collected. We construct a preliminary dataset of 6,000 warrants annotated over 600 arguments for 3 debatable topics. To facilitate research in related downstream tasks, we release our guidelines and preliminary dataset.
Multimodal Sentiment Analysis (MuSe) 2021 is a challenge focusing on the tasks of sentiment and emotion, as well as physiological-emotion and emotion-based stress recognition through more comprehensively integrating the audio-visual, language, and biological signal modalities. The purpose of MuSe 2021 is to bring together communities from different disciplines; mainly, the audio-visual emotion recognition community (signal-based), the sentiment analysis community (symbol-based), and the health informatics community. We present four distinct sub-challenges: MuSe-Wilder and MuSe-Stress which focus on continuous emotion (valence and arousal) prediction; MuSe-Sent, in which participants recognise five classes each for valence and arousal; and MuSe-Physio, in which the novel aspect of `physiological-emotion' is to be predicted. For this years' challenge, we utilise the MuSe-CaR dataset focusing on user-generated reviews and introduce the Ulm-TSST dataset, which displays people in stressful depositions. This paper also provides detail on the state-of-the-art feature sets extracted from these datasets for utilisation by our baseline model, a Long Short-Term Memory-Recurrent Neural Network. For each sub-challenge, a competitive baseline for participants is set; namely, on test, we report a Concordance Correlation Coefficient (CCC) of .4616 CCC for MuSe-Wilder; .4717 CCC for MuSe-Stress, and .4606 CCC for MuSe-Physio. For MuSe-Sent an F1 score of 32.82 % is obtained.
Due to the significance and value in human-computer interaction and natural language processing, task-oriented dialog systems are attracting more and more attention in both academic and industrial communities. In this paper, we survey recent advances and challenges in an issue-specific manner. We discuss three critical topics for task-oriented dialog systems: (1) improving data efficiency to facilitate dialog system modeling in low-resource settings, (2) modeling multi-turn dynamics for dialog policy learning to achieve better task-completion performance, and (3) integrating domain ontology knowledge into the dialog model in both pipeline and end-to-end models. We also review the recent progresses in dialog evaluation and some widely-used corpora. We believe that this survey can shed a light on future research in task-oriented dialog systems.
There is a resurgent interest in developing intelligent open-domain dialog systems due to the availability of large amounts of conversational data and the recent progress on neural approaches to conversational AI. Unlike traditional task-oriented bots, an open-domain dialog system aims to establish long-term connections with users by satisfying the human need for communication, affection, and social belonging. This paper reviews the recent works on neural approaches that are devoted to addressing three challenges in developing such systems: semantics, consistency, and interactiveness. Semantics requires a dialog system to not only understand the content of the dialog but also identify user's social needs during the conversation. Consistency requires the system to demonstrate a consistent personality to win users trust and gain their long-term confidence. Interactiveness refers to the system's ability to generate interpersonal responses to achieve particular social goals such as entertainment, conforming, and task completion. The works we select to present here is based on our unique views and are by no means complete. Nevertheless, we hope that the discussion will inspire new research in developing more intelligent dialog systems.
Small data challenges have emerged in many learning problems, since the success of deep neural networks often relies on the availability of a huge amount of labeled data that is expensive to collect. To address it, many efforts have been made on training complex models with small data in an unsupervised and semi-supervised fashion. In this paper, we will review the recent progresses on these two major categories of methods. A wide spectrum of small data models will be categorized in a big picture, where we will show how they interplay with each other to motivate explorations of new ideas. We will review the criteria of learning the transformation equivariant, disentangled, self-supervised and semi-supervised representations, which underpin the foundations of recent developments. Many instantiations of unsupervised and semi-supervised generative models have been developed on the basis of these criteria, greatly expanding the territory of existing autoencoders, generative adversarial nets (GANs) and other deep networks by exploring the distribution of unlabeled data for more powerful representations. While we focus on the unsupervised and semi-supervised methods, we will also provide a broader review of other emerging topics, from unsupervised and semi-supervised domain adaptation to the fundamental roles of transformation equivariance and invariance in training a wide spectrum of deep networks. It is impossible for us to write an exclusive encyclopedia to include all related works. Instead, we aim at exploring the main ideas, principles and methods in this area to reveal where we are heading on the journey towards addressing the small data challenges in this big data era.
Deep learning applies multiple processing layers to learn representations of data with multiple levels of feature extraction. This emerging technique has reshaped the research landscape of face recognition since 2014, launched by the breakthroughs of Deepface and DeepID methods. Since then, deep face recognition (FR) technique, which leverages the hierarchical architecture to learn discriminative face representation, has dramatically improved the state-of-the-art performance and fostered numerous successful real-world applications. In this paper, we provide a comprehensive survey of the recent developments on deep FR, covering the broad topics on algorithms, data, and scenes. First, we summarize different network architectures and loss functions proposed in the rapid evolution of the deep FR methods. Second, the related face processing methods are categorized into two classes: `one-to-many augmentation' and `many-to-one normalization'. Then, we summarize and compare the commonly used databases for both model training and evaluation. Third, we review miscellaneous scenes in deep FR, such as cross-factor, heterogenous, multiple-media and industry scenes. Finally, potential deficiencies of the current methods and several future directions are highlighted.
The era of big data provides researchers with convenient access to copious data. However, people often have little knowledge about it. The increasing prevalence of big data is challenging the traditional methods of learning causality because they are developed for the cases with limited amount of data and solid prior causal knowledge. This survey aims to close the gap between big data and learning causality with a comprehensive and structured review of traditional and frontier methods and a discussion about some open problems of learning causality. We begin with preliminaries of learning causality. Then we categorize and revisit methods of learning causality for the typical problems and data types. After that, we discuss the connections between learning causality and machine learning. At the end, some open problems are presented to show the great potential of learning causality with data.
Music recommender systems (MRS) have experienced a boom in recent years, thanks to the emergence and success of online streaming services, which nowadays make available almost all music in the world at the user's fingertip. While today's MRS considerably help users to find interesting music in these huge catalogs, MRS research is still facing substantial challenges. In particular when it comes to build, incorporate, and evaluate recommendation strategies that integrate information beyond simple user--item interactions or content-based descriptors, but dig deep into the very essence of listener needs, preferences, and intentions, MRS research becomes a big endeavor and related publications quite sparse. The purpose of this trends and survey article is twofold. We first identify and shed light on what we believe are the most pressing challenges MRS research is facing, from both academic and industry perspectives. We review the state of the art towards solving these challenges and discuss its limitations. Second, we detail possible future directions and visions we contemplate for the further evolution of the field. The article should therefore serve two purposes: giving the interested reader an overview of current challenges in MRS research and providing guidance for young researchers by identifying interesting, yet under-researched, directions in the field.
Sentiment analysis is proven to be very useful tool in many applications regarding social media. This has led to a great surge of research in this field. Hence, in this paper, we compile the baselines for such research. In this paper, we explore three different deep-learning based architectures for multimodal sentiment classification, each improving upon the previous. Further, we evaluate these architectures with multiple datasets with fixed train/test partition. We also discuss some major issues, frequently ignored in multimodal sentiment analysis research, e.g., role of speaker-exclusive models, importance of different modalities, and generalizability. This framework illustrates the different facets of analysis to be considered while performing multimodal sentiment analysis and, hence, serves as a new benchmark for future research in this emerging field. We draw a comparison among the methods using empirical data, obtained from the experiments. In the future, we plan to focus on extracting semantics from visual features, cross-modal features and fusion.