In the face of a new era of generative models, the detection of artificially generated content has become a matter of utmost importance. The ability to create credible minute-long music deepfakes in a few seconds on user-friendly platforms poses a real threat of fraud on streaming services and unfair competition to human artists. This paper demonstrates the possibility (and surprising ease) of training classifiers on datasets comprising real audio and fake reconstructions, achieving a convincing accuracy of 99.8%. To our knowledge, this marks the first publication of a music deepfake detector, a tool that will help in the regulation of music forgery. Nevertheless, informed by decades of literature on forgery detection in other fields, we stress that a good test score is not the end of the story. We step back from the straightforward ML framework and expose many facets that could be problematic with such a deployed detector: calibration, robustness to audio manipulation, generalisation to unseen models, interpretability and possibility for recourse. This second part acts as a position for future research steps in the field and a caveat to a flourishing market of fake content checkers.
暂无翻译
In this technical report, we introduce SEED-Data-Edit: a unique hybrid dataset for instruction-guided image editing, which aims to facilitate image manipulation using open-form language. SEED-Data-Edit is composed of three distinct types of data: (1) High-quality editing data produced by an automated pipeline, ensuring a substantial volume of diverse image editing pairs. (2) Real-world scenario data collected from the internet, which captures the intricacies of user intentions for promoting the practical application of image editing in the real world. (3) High-precision multi-turn editing data annotated by humans, which involves multiple rounds of edits for simulating iterative editing processes. The combination of these diverse data sources makes SEED-Data-Edit a comprehensive and versatile dataset for training language-guided image editing model. We fine-tune a pretrained Multimodal Large Language Model (MLLM) that unifies comprehension and generation with SEED-Data-Edit. The instruction tuned model demonstrates promising results, indicating the potential and effectiveness of SEED-Data-Edit in advancing the field of instructional image editing. The datasets are released in https://huggingface.co/datasets/AILab-CVC/SEED-Data-Edit.
暂无翻译
Preserving and encouraging mobility in the elderly and adults with chronic conditions is of paramount importance. However, existing walking aids are either inadequate to provide sufficient support to users' stability or too bulky and poorly maneuverable to be used outside hospital environments. In addition, they all lack adaptability to individual requirements. To address these challenges, this paper introduces WANDER, a novel Walking Assistive omNi-Directional Exo-Robot. It consists of an omnidirectional platform and a robust aluminum structure mounted on top of it, which provides partial body weight support. A comfortable and minimally restrictive coupling interface embedded with a force/torque sensor allows to detect users' intentions, which are translated into command velocities by means of a variable admittance controller. An optimization technique based on users' preferences, i.e., Preference-Based Optimization (PBO) guides the choice of the admittance parameters (i.e., virtual mass and damping) to better fit subject-specific needs and characteristics. Experiments with twelve healthy subjects exhibited a significant decrease in energy consumption and jerk when using WANDER with PBO parameters as well as improved user performance and comfort. The great interpersonal variability in the optimized parameters highlights the importance of personalized control settings when walking with an assistive device, aiming to enhance users' comfort and mobility while ensuring reliable physical support.
暂无翻译
Predicting Remaining Useful Life (RUL) plays a crucial role in the prognostics and health management of industrial systems that involve a variety of interrelated sensors. Given a constant stream of time series sensory data from such systems, deep learning models have risen to prominence at identifying complex, nonlinear temporal dependencies in these data. In addition to the temporal dependencies of individual sensors, spatial dependencies emerge as important correlations among these sensors, which can be naturally modelled by a temporal graph that describes time-varying spatial relationships. However, the majority of existing studies have relied on capturing discrete snapshots of this temporal graph, a coarse-grained approach that leads to loss of temporal information. Moreover, given the variety of heterogeneous sensors, it becomes vital that such inherent heterogeneity is leveraged for RUL prediction in temporal sensor graphs. To capture the nuances of the temporal and spatial relationships and heterogeneous characteristics in an interconnected graph of sensors, we introduce a novel model named Temporal and Heterogeneous Graph Neural Networks (THGNN). Specifically, THGNN aggregates historical data from neighboring nodes to accurately capture the temporal dynamics and spatial correlations within the stream of sensor data in a fine-grained manner. Moreover, the model leverages Feature-wise Linear Modulation (FiLM) to address the diversity of sensor types, significantly improving the model's capacity to learn the heterogeneity in the data sources. Finally, we have validated the effectiveness of our approach through comprehensive experiments. Our empirical findings demonstrate significant advancements on the N-CMAPSS dataset, achieving improvements of up to 19.2% and 31.6% in terms of two different evaluation metrics over state-of-the-art methods.
暂无翻译
Suicide and suicidal behaviors remain significant challenges for public policy and healthcare. In response, psychological support hotlines have been established worldwide to provide immediate help to individuals in mental crises. The effectiveness of these hotlines largely depends on accurately identifying callers' emotional states, particularly underlying negative emotions indicative of increased suicide risk. However, the high demand for psychological interventions often results in a shortage of professional operators, highlighting the need for an effective speech emotion recognition model. This model would automatically detect and analyze callers' emotions, facilitating integration into hotline services. Additionally, it would enable large-scale data analysis of psychological support hotline interactions to explore psychological phenomena and behaviors across populations. Our study utilizes data from the Beijing psychological support hotline, the largest suicide hotline in China. We analyzed speech data from 105 callers containing 20,630 segments and categorized them into 11 types of negative emotions. We developed a negative emotion recognition model and a fine-grained multi-label classification model using a large-scale pre-trained model. Our experiments indicate that the negative emotion recognition model achieves a maximum F1-score of 76.96%. However, it shows limited efficacy in the fine-grained multi-label classification task, with the best model achieving only a 41.74% weighted F1-score. We conducted an error analysis for this task, discussed potential future improvements, and considered the clinical application possibilities of our study. All the codes are public available.
暂无翻译
The multigrid V-cycle method is a popular method for solving systems of linear equations. It computes an approximate solution by using smoothing on fine levels and solving a system of linear equations on the coarsest level. Solving on the coarsest level depends on the size and difficulty of the problem. If the size permits, it is typical to use a direct method based on LU or Cholesky decomposition. In settings with large coarsest-level problems, approximate solvers such as iterative Krylov subspace methods, or direct methods based on low-rank approximation, are often used. The accuracy of the coarsest-level solver is typically determined based on the experience of the users with the concrete problems and methods. In this paper we present an approach to analyzing the effects of approximate coarsest-level solves on the convergence of the V-cycle method for symmetric positive definite problems. Using these results, we derive coarsest-level stopping criterion through which we may control the difference between the approximation computed by a V-cycle method with approximate coarsest-level solver and the approximation which would be computed if the coarsest-level problems were solved exactly. The coarsest-level stopping criterion may thus be set up such that the V-cycle method converges to a chosen finest-level accuracy in (nearly) the same number of V-cycle iterations as the V-cycle method with exact coarsest-level solver. We also utilize the theoretical results to discuss how the convergence of the V-cycle method may be affected by the choice of a tolerance in a coarsest-level stopping criterion based on the relative residual norm.
暂无翻译
Forecasting, to estimate future events, is crucial for business and decision-making. This paper proposes QxEAI, a methodology that produces a probabilistic forecast that utilizes a quantum-like evolutionary algorithm based on training a quantum-like logic decision tree and a classical value tree on a small number of related time series. By using different cycles of the Dow Jones Index (yearly, monthly, weekly, daily), we demonstrate how our methodology produces accurate forecasts while requiring little to none manual work.
暂无翻译
With the development of laws and regulations related to privacy preservation, it has become difficult to collect personal data to perform machine learning. In this context, federated learning, which is distributed learning without sharing personal data, has been proposed. In this paper, we focus on federated learning for user authentication. We show that it is difficult to achieve both privacy preservation and high accuracy with existing methods. To address these challenges, we propose IPFed which is privacy-preserving federated learning using random projection for class embedding. Furthermore, we prove that IPFed is capable of learning equivalent to the state-of-the-art method. Experiments on face image datasets show that IPFed can protect the privacy of personal data while maintaining the accuracy of the state-of-the-art method.
暂无翻译
As a deep learning model, Visual Mamba (VMamba) has a low computational complexity and a global receptive field, which has been successful applied to image classification and detection. To extend its applications, we apply VMamba to crowd counting and propose a novel VMambaCC (VMamba Crowd Counting) model. Naturally, VMambaCC inherits the merits of VMamba, or global modeling for images and low computational cost. Additionally, we design a Multi-head High-level Feature (MHF) attention mechanism for VMambaCC. MHF is a new attention mechanism that leverages high-level semantic features to augment low-level semantic features, thereby enhancing spatial feature representation with greater precision. Building upon MHF, we further present a High-level Semantic Supervised Feature Pyramid Network (HS2PFN) that progressively integrates and enhances high-level semantic information with low-level semantic information. Extensive experimental results on five public datasets validate the efficacy of our approach. For example, our method achieves a mean absolute error of 51.87 and a mean squared error of 81.3 on the ShangHaiTech\_PartA dataset. Our code is coming soon.
暂无翻译
This paper presents CleanGraph, an interactive web-based tool designed to facilitate the refinement and completion of knowledge graphs. Maintaining the reliability of knowledge graphs, which are grounded in high-quality and error-free facts, is crucial for real-world applications such as question-answering and information retrieval systems. These graphs are often automatically assembled from textual sources by extracting semantic triples via information extraction. However, assuring the quality of these extracted triples, especially when dealing with large or low-quality datasets, can pose a significant challenge and adversely affect the performance of downstream applications. CleanGraph allows users to perform Create, Read, Update, and Delete (CRUD) operations on their graphs, as well as apply models in the form of plugins for graph refinement and completion tasks. These functionalities enable users to enhance the integrity and reliability of their graph data. A demonstration of CleanGraph and its source code can be accessed at https://github.com/nlp-tlp/CleanGraph under the MIT License.
暂无翻译