SparseWorld：一种由稀疏动态查询驱动的灵活、自适应、高效的四维占据世界模型 (SparseWorld: A Flexible, Adaptive, and Efficient 4D Occupancy World Model Powered by Sparse and Dynamic Queries)

Semantic occupancy has emerged as a powerful representation in world models for its ability to capture rich spatial semantics. However, most existing occupancy world models rely on static and fixed embeddings or grids, which inherently limit the flexibility of perception. Moreover, their ``in-place classification" over grids exhibits a potential misalignment with the dynamic and continuous nature of real scenarios.In this paper, we propose SparseWorld, a novel 4D occupancy world model that is flexible, adaptive, and efficient, powered by sparse and dynamic queries. We propose a Range-Adaptive Perception module, in which learnable queries are modulated by the ego vehicle states and enriched with temporal-spatial associations to enable extended-range perception. To effectively capture the dynamics of the scene, we design a State-Conditioned Forecasting module, which replaces classification-based forecasting with regression-guided formulation, precisely aligning the dynamic queries with the continuity of the 4D environment. In addition, We specifically devise a Temporal-Aware Self-Scheduling training strategy to enable smooth and efficient training. Extensive experiments demonstrate that SparseWorld achieves state-of-the-art performance across perception, forecasting, and planning tasks. Comprehensive visualizations and ablation studies further validate the advantages of SparseWorld in terms of flexibility, adaptability, and efficiency. The code is available at https://github.com/MSunDYY/SparseWorld.

翻译：语义占据凭借其捕获丰富空间语义的能力，已成为世界模型中一种强大的表示方法。然而，现有的大多数占据世界模型依赖于静态且固定的嵌入或网格，这本质上限制了感知的灵活性。此外，它们在网格上进行的“原地分类”与现实场景的动态连续特性存在潜在的不匹配。本文提出SparseWorld，一种新颖的四维占据世界模型，它由稀疏动态查询驱动，具有灵活、自适应和高效的特点。我们提出了一个范围自适应感知模块，其中可学习的查询由自车状态调制，并通过时空关联进行丰富，以实现远距离感知。为了有效捕捉场景的动态特性，我们设计了一个状态条件预测模块，该模块用回归引导的公式取代了基于分类的预测，从而将动态查询与四维环境的连续性精确对齐。此外，我们专门设计了一种时序感知的自调度训练策略，以实现平滑高效的训练。大量实验表明，SparseWorld在感知、预测和规划任务上均达到了最先进的性能。全面的可视化结果和消融研究进一步验证了SparseWorld在灵活性、适应性和效率方面的优势。代码发布于 https://github.com/MSunDYY/SparseWorld。

相关内容

MoDELS

关注 44

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日