专知, 为人工智能从业者服务!

会员服务 ·

专业可信的知识分发

高级搜索

知识 (knowledge) · Tensor · 图 · MoDELS · 分解的 ·

4 月 3 日

Knowledge Graph Completion with Mixed Geometry Tensor Factorization

暂无翻译

Viacheslav Yusupov,Maxim Rakhuba,Evgeny Frolov

from arxiv, Accepted to AISTATS 2025

In this paper, we propose a new geometric approach for knowledge graph completion via low rank tensor approximation. We augment a pretrained and well-established Euclidean model based on a Tucker tensor decomposition with a novel hyperbolic interaction term. This correction enables more nuanced capturing of distributional properties in data better aligned with real-world knowledge graphs. By combining two geometries together, our approach improves expressivity of the resulting model achieving new state-of-the-art link prediction accuracy with a significantly lower number of parameters compared to the previous Euclidean and hyperbolic models.

暂无翻译

AI · MoDELS · 论文 · 稳健性 · 汇聚 ·

4 月 2 日

Reinsuring AI: Energy, Agriculture, Finance & Medicine as Precedents for Scalable Governance of Frontier Artificial Intelligence

暂无翻译

Nicholas Stetler

from arxiv, Working paper version (35 pages). Submitted to So. Ill. Law Journal; full-form citations retained for editorial review. Not peer-reviewed. Subject to revision

The governance of frontier artificial intelligence (AI) systems--particularly those capable of catastrophic misuse or systemic failure--requires institutional structures that are robust, adaptive, and innovation-preserving. This paper proposes a novel framework for governing such high-stakes models through a three-tiered insurance architecture: (1) mandatory private liability insurance for frontier model developers; (2) an industry-administered risk pool to absorb recurring, non-catastrophic losses; and (3) federally backed reinsurance for tail-risk events. Drawing from historical precedents in nuclear energy (Price-Anderson), terrorism risk (TRIA), agricultural crop insurance, flood reinsurance, and medical malpractice, the proposal shows how the federal government can stabilize private AI insurance markets without resorting to brittle regulation or predictive licensing regimes. The structure aligns incentives between AI developers and downstream stakeholders, transforms safety practices into insurable standards, and enables modular oversight through adaptive eligibility criteria. By focusing on risk-transfer mechanisms rather than prescriptive rules, this framework seeks to render AI safety a structural feature of the innovation ecosystem itself--integrated into capital markets, not external to them. The paper concludes with a legal and administrative feasibility analysis, proposing avenues for statutory authorization and agency placement within existing federal structures.

暂无翻译

MoDELS · 稳健性 · 数据集 · 评论员 · 优化器 ·

4 月 3 日

Evaluating and Enhancing Segmentation Model Robustness with Metamorphic Testing

暂无翻译

Seif Mzoughi,Mohamed Elshafeia,Foutse Khomh

Image segmentation is critical for applications such as medical imaging, augmented reality, and video surveillance. However, segmentation models often lack robustness, making them vulnerable to adversarial perturbations from subtle image distortions. In this work, we propose SegRMT, a metamorphic testing approach that leverages genetic algorithms (GA) to optimize sequences of spatial and spectral transformations while preserving image fidelity via a predefined PSNR threshold. Using the Cityscapes dataset, our method generates adversarial examples that effectively challenge the DeepLabV3 segmentation model. Our experiments show that SegRMT reduces DeepLabV3's mean Intersection over Union (mIoU) to 6.4%, outperforming other adversarial baselines that decrease mIoU to between 8.5% and 21.7%. Furthermore, when used for adversarial training, SegRMT boosts model performance, achieving mIoU improvements up to 73% on dedicated adversarial datasets and increasing cross-adversarial mIoU to 53.8%, compared to only 2%-10% for other methods. These findings demonstrate that SegRMT not only simulates realistic image distortions but also enhances the robustness of segmentation models, making it a valuable tool for ensuring reliable performance in safety-critical applications.

暂无翻译

MoDELS · 泛化理论 · Learning · 相似度 · 可理解性 ·

4 月 3 日

Utility Theory of Synthetic Data Generation

暂无翻译

Shirong Xu,Will Wei Sun,Guang Cheng

Synthetic data algorithms are widely employed in industries to generate artificial data for downstream learning tasks. While existing research primarily focuses on empirically evaluating utility of synthetic data, its theoretical understanding is largely lacking. This paper bridges the practice-theory gap by establishing relevant utility theory in a statistical learning framework. It considers two utility metrics: generalization and ranking of models trained on synthetic data. The former is defined as the generalization difference between models trained on synthetic and on real data. By deriving analytical bounds for this utility metric, we demonstrate that the synthetic feature distribution does not need to be similar as that of real data for ensuring comparable generalization of synthetic models, provided proper model specifications in downstream learning tasks. The latter utility metric studies the relative performance of models trained on synthetic data. In particular, we discover that the distribution of synthetic data is not necessarily similar as the real one to ensure consistent model comparison. Interestingly, consistent model comparison is still achievable even when synthetic responses are not well generated, as long as downstream models are separable by a generalization gap. Finally, extensive experiments on non-parametric models and deep neural networks have been conducted to validate these theoretical findings.

暂无翻译

AI · 操作 · Performer · 分解的 · 讲稿 ·

4 月 3 日

How humans evaluate AI systems for person detection in automatic train operation: Not all misses are alike

暂无翻译

Romy Müller

If artificial intelligence (AI) is to be applied in safety-critical domains, its performance needs to be evaluated reliably. The present study aimed to understand how humans evaluate AI systems for person detection in automatic train operation. In three experiments, participants saw image sequences of people moving in the vicinity of railway tracks. A simulated AI had highlighted all detected people, sometimes correctly and sometimes not. Participants had to provide a numerical rating of the AI's performance and then verbally explain their rating. The experiments varied several factors that might influence human ratings: the types and plausibility of AI mistakes, the number of affected images, the number of people present in an image, the position of people relevant to the tracks, and the methods used to elicit human evaluations. While all these factors influenced human ratings, some effects were unexpected or deviated from normative standards. For instance, the factor with the strongest impact was people's position relative to the tracks, although participants had explicitly been instructed that the AI could not process such information. Taken together, the results suggest that humans may sometimes evaluate more than the AI's performance on the assigned task. Such mismatches between AI capabilities and human expectations should be taken into consideration when conducting safety audits of AI systems.

暂无翻译

Learning · 控制器 · 强化学习 · Integration · 最优化 ·

4 月 3 日

Hierarchical Policy-Gradient Reinforcement Learning for Multi-Agent Shepherding Control of Non-Cohesive Targets

暂无翻译

Stefano Covone,Italo Napolitano,Francesco De Lellis,Mario di Bernardo

We propose a decentralized reinforcement learning solution for multi-agent shepherding of non-cohesive targets using policy-gradient methods. Our architecture integrates target-selection with target-driving through Proximal Policy Optimization, overcoming discrete-action constraints of previous Deep Q-Network approaches and enabling smoother agent trajectories. This model-free framework effectively solves the shepherding problem without prior dynamics knowledge. Experiments demonstrate our method's effectiveness and scalability with increased target numbers and limited sensing capabilities.

暂无翻译

MoDELS · 回合 · 估计/估计量 · 样本 · 泛函 ·

4 月 3 日

Online Hybrid-Belief POMDP with Coupled Semantic-Geometric Models and Semantic Safety Awareness

暂无翻译

Tuvy Lemberg,Vadim Indelman

from arxiv, 18 pages, 11 figures

Robots operating in complex and unknown environments frequently require geometric-semantic representations of the environment to safely perform their tasks. While inferring the environment, they must account for many possible scenarios when planning future actions. Since objects' class types are discrete and the robot's self-pose and the objects' poses are continuous, the environment can be represented by a hybrid discrete-continuous belief which is updated according to models and incoming data. Prior probabilities and observation models representing the environment can be learned from data using deep learning algorithms. Such models often couple environmental semantic and geometric properties. As a result, semantic variables are interconnected, causing semantic state space dimensionality to increase exponentially. In this paper, we consider planning under uncertainty using partially observable Markov decision processes (POMDPs) with hybrid semantic-geometric beliefs. The models and priors consider the coupling between semantic and geometric variables. Within POMDP, we introduce the concept of semantically aware safety. Obtaining representative samples of the theoretical hybrid belief, required for estimating the value function, is very challenging. As a key contribution, we develop a novel form of the hybrid belief and leverage it to sample representative samples. We show that under certain conditions, the value function and probability of safety can be calculated efficiently with an explicit expectation over all possible semantic mappings. Our simulations show that our estimates of the objective function and probability of safety achieve similar levels of accuracy compared to estimators that run exhaustively on the entire semantic state-space using samples from the theoretical hybrid belief. Nevertheless, the complexity of our estimators is polynomial rather than exponential.

暂无翻译

代码 · MoDELS · 模型评估 · 变换 · GPT-4o ·

4 月 2 日

Enhancing LLMs in Long Code Translation through Instrumentation and Program State Alignment

暂无翻译

Li Xin-Ye,Du Ya-Li,Li Ming

from arxiv, 20 pages

Code translation aims to transform code between programming languages while preserving functionality, with applications in cross-platform development and software migration. Recent advances in Large Language Models (LLMs) have improved code translation, but challenges remain, particularly in inferring program functionality. These issues worsen with longer and more complex code, where current LLMs struggle to handle length and intricate semantics. To evaluate LLMs on long code translation, we introduce LongTrans, a large-scale execution-based benchmark with C++, Java, and Python programs, ranging from hundreds to thousands of tokens. Our empirical study of 12 LLMs reveals a sharp performance decline as code length increases, with even the best-performing model, GPT-4o, achieving only 57.51% computational accuracy. This highlights the need for further research in long code translation. We argue that code translation should maintain invariant functionality while transforming syntax and keywords across languages. Despite differences in appearance, program states should remain consistent throughout execution. To address this, we propose PAST (Program State Alignment augmented Translation), which integrates instrumentation to capture and align program states during translation. This approach is the first to leverage LLMs to insert instrumentation in both original and translated code, tracing program states at runtime. By prompting the LLM to correct errors based on output traces, we mitigate inconsistencies and enhance translation accuracy. Experimental results show significant improvements, with computational accuracy rising from 57.51% to 84.70% for GPT-4o, 50.68% to 69.97% for Mistral-Large-2, and 52.45% to 76.43% for DeepSeek-Coder-V2. These improvements are consistent across models and datasets, with ablation studies confirming the benefits of instrumentation and state alignment.

暂无翻译

MoDELS · Branch · INTERACT · 级联 · 输出 ·

4 月 3 日

OmniTalker: Real-Time Text-Driven Talking Head Generation with In-Context Audio-Visual Style Replication

暂无翻译

Zhongjian Wang,Peng Zhang,Jinwei Qi,Guangyuan Wang Sheng Xu,Bang Zhang,Liefeng Bo

from arxiv, Project Page https://humanaigc.github.io/omnitalker

Recent years have witnessed remarkable advances in talking head generation, owing to its potential to revolutionize the human-AI interaction from text interfaces into realistic video chats. However, research on text-driven talking heads remains underexplored, with existing methods predominantly adopting a cascaded pipeline that combines TTS systems with audio-driven talking head models. This conventional pipeline not only introduces system complexity and latency overhead but also fundamentally suffers from asynchronous audiovisual output and stylistic discrepancies between generated speech and visual expressions. To address these limitations, we introduce OmniTalker, an end-to-end unified framework that simultaneously generates synchronized speech and talking head videos from text and reference video in real-time zero-shot scenarios, while preserving both speech style and facial styles. The framework employs a dual-branch diffusion transformer architecture: the audio branch synthesizes mel-spectrograms from text, while the visual branch predicts fine-grained head poses and facial dynamics. To bridge modalities, we introduce a novel audio-visual fusion module that integrates cross-modal information to ensure temporal synchronization and stylistic coherence between audio and visual outputs. Furthermore, our in-context reference learning module effectively captures both speech and facial style characteristics from a single reference video without introducing an extra style extracting module. To the best of our knowledge, OmniTalker presents the first unified framework that jointly models speech style and facial style in a zero-shot setting, achieving real-time inference speed of 25 FPS. Extensive experiments demonstrate that our method surpasses existing approaches in generation quality, particularly excelling in style preservation and audio-video synchronization.

暂无翻译

Analysis · 话题 · Automator · 情景 · INFORMS ·

4 月 3 日

Multi-Modal Framing Analysis of News

暂无翻译

Arnav Arora,Srishti Yadav,Maria Antoniak,Serge Belongie,Isabelle Augenstein

Automated frame analysis of political communication is a popular task in computational social science that is used to study how authors select aspects of a topic to frame its reception. So far, such studies have been narrow, in that they use a fixed set of pre-defined frames and focus only on the text, ignoring the visual contexts in which those texts appear. Especially for framing in the news, this leaves out valuable information about editorial choices, which include not just the written article but also accompanying photographs. To overcome such limitations, we present a method for conducting multi-modal, multi-label framing analysis at scale using large (vision-)language models. Grounding our work in framing theory, we extract latent meaning embedded in images used to convey a certain point and contrast that to the text by comparing the respective frames used. We also identify highly partisan framing of topics with issue-specific frame analysis found in prior qualitative work. We demonstrate a method for doing scalable integrative framing analysis of both text and image in news, providing a more complete picture for understanding media bias.

暂无翻译

登陆后查看更多精品内容

热门VIP内容

开通专知VIP会员享更多权益服务

最新，DeepSeek-R1论文登上Nature封面，附83页补充材料

人工智能与未来战争

自动驾驶中的轨迹预测大型基础模型：全面综述

万字长文《对抗雷达系统的电子战综述》

VIP会员

本周荟萃主题

区块链

区块链（Blockchain）是由节点参与的分布式数据库系统，它的特点是不可更改，不可伪造，也可以将其理解为账簿系统(ledger)。它是比特币的一个重要概念，完整比特币区块链的副本，记录了其代币（token）的每一笔交易。通过这些信息，我们可以找到每一个地址，在历史上任何一点所拥有的价值。

深度学习

机器学习的一个分支，它基于试图使用包含复杂结构或由多重非线性变换构成的多个处理层对数据进行高层抽象的一系列算法。

机器学习

“机器学习是近20多年兴起的一门多领域交叉学科，涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。机器学习理论主要是设计和分析一些让可以自动“ 学习”的算法。机器学习算法是一类从数据中自动分析获得规律，并利用规律对未知数据进行预测的算法。因为学习算法中涉及了大量的统计学理论，机器学习与统计推断学联系尤为密切，也被称为统计学习理论。算法设计方面，机器学习理论关注可以实现的，行之有效的学习算法。很多推论问题属于无程序可循难度，所以部分的机器学习研究是开发容易处理的近似算法。”

——中文维基百科

强化学习

强化学习（RL）是机器学习的一个领域，与软件代理应如何在环境中采取行动以最大化累积奖励的概念有关。除了监督学习和非监督学习外，强化学习是三种基本的机器学习范式之一。强化学习与监督学习的不同之处在于，不需要呈现带标签的输入/输出对，也不需要显式纠正次优动作。相反，重点是在探索（未知领域）和利用（当前知识）之间找到平衡。该环境通常以马尔可夫决策过程（MDP）的形式陈述，因为针对这种情况的许多强化学习算法都使用动态编程技术。经典动态规划方法和强化学习算法之间的主要区别在于，后者不假设MDP的确切数学模型，并且针对无法采用精确方法的大型MDP。

推荐系统

推荐系统，是指根据用户的习惯、偏好或兴趣，从不断到来的大规模信息中识别满足用户兴趣的信息的过程。推荐推荐任务中的信息往往称为物品(Item)。根据具体应用背景的不同，这些物品可以是新闻、电影、音乐、广告、商品等各种对象。推荐系统利用电子商务网站向客户提供商品信息和建议，帮助用户决定应该购买什么产品，模拟销售人员帮助客户完成购买过程。个性化推荐是根据用户的兴趣特点和购买行为，向用户推荐用户感兴趣的信息和商品。随着电子商务规模的不断扩大，商品个数和种类快速增长，顾客需要花费大量的时间才能找到自己想买的商品。这种浏览大量无关的信息和产品过程无疑会使淹没在信息过载问题中的消费者不断流失。为了解决这些问题，个性化推荐系统应运而生。个性化推荐系统是建立在海量数据挖掘基础上的一种高级商务智能平台，以帮助电子商务网站为其顾客购物提供完全个性化的决策支持和信息服务。

卷积神经网络

在深度学习中，卷积神经网络（CNN或ConvNet）是一类深度神经网络，最常用于分析视觉图像。基于它们的共享权重架构和平移不变性特征，它们也被称为位移不变或空间不变的人工神经网络（SIANN）。它们在图像和视频识别，推荐系统，图像分类，医学图像分析，自然语言处理，和财务时间序列中都有应用。

计算机网络

计算机网络( Computer Networks )指将地理位置不同的多台计算机及其外部设备，通过通信线路连接起来，在网络操作系统及网络通信协议的管理和协调下，实现资源共享和信息传递的计算机系统。

命名实体识别

命名实体识别（NER）（也称为实体标识，实体组块和实体提取）是信息抽取的子任务，旨在将非结构化文本中提到的命名实体定位和分类为预定义类别，例如人员姓名、地名、机构名、专有名词等。

机器翻译

机器翻译，又称为自动翻译，是利用计算机将一种自然语言(源语言)转换为另一种自然语言(目标语言)的过程。它是计算语言学的一个分支，是人工智能的终极目标之一，具有重要的科学研究价值。

计算机视觉

计算机视觉是一门研究如何使机器“看”的科学，更进一步的说，就是是指用摄影机和电脑代替人眼对目标进行识别、跟踪和测量等机器视觉，并进一步做图形处理，使电脑处理成为更适合人眼观察或传送给仪器检测的图像。作为一个科学学科，计算机视觉研究相关的理论和技术，试图建立能够从图像或者多维数据中获取‘信息’的人工智能系统。

微信扫码咨询专知VIP会员

Top