Vision-Language-Action models (VLAs) are becoming increasingly capable across diverse robotic tasks. However, their real-world deployment remains slow and inefficient: demonstration videos are often sped up by 5-10x to appear smooth, with noticeable action stalls and delayed reactions to environmental changes. Asynchronous inference offers a promising solution to achieve continuous and low-latency control by enabling robots to execute actions and perform inference simultaneously. However, because the robot and environment continue to evolve during inference, a temporal misalignment arises between the prediction and execution intervals. This leads to significant action instability, while existing methods either degrade accuracy or introduce runtime overhead to mitigate it. We propose VLASH, a general asynchronous inference framework for VLAs that delivers smooth, accurate, and fast reaction control without additional overhead or architectural changes. VLASH estimates the future execution-time state by rolling the robot state forward with the previously generated action chunk, thereby bridging the gap between prediction and execution. Experiments show that VLASH achieves up to 2.03x speedup and reduces reaction latency by up to 17.4x compared to synchronous inference while fully preserving the original accuracy. Moreover, it empowers VLAs to handle fast-reaction, high-precision tasks such as playing ping-pong and playing whack-a-mole, where traditional synchronous inference fails. Code is available at https://github.com/mit-han-lab/vlash
翻译:视觉-语言-动作模型(VLAs)在多样化机器人任务中正变得日益强大。然而,其在真实世界中的部署仍然缓慢且低效:演示视频通常被加速5-10倍以显得流畅,同时存在明显的动作停滞和对环境变化的反应延迟。异步推理通过使机器人能够同时执行动作和进行推理,为实现连续且低延迟的控制提供了一种有前景的解决方案。然而,由于机器人和环境在推理过程中持续演变,预测区间与执行区间之间会出现时间错位。这导致显著的动作不稳定性,而现有方法要么降低准确性,要么引入运行时开销来缓解此问题。我们提出了VLASH,一个面向VLAs的通用异步推理框架,它能在不增加额外开销或改变架构的情况下,提供平滑、准确且快速的反应控制。VLASH通过使用先前生成的动作块向前滚动机器人状态来估计未来的执行时状态,从而弥合预测与执行之间的差距。实验表明,与同步推理相比,VLASH实现了高达2.03倍的加速,并将反应延迟降低了高达17.4倍,同时完全保持了原始准确性。此外,它使VLAs能够处理快速反应、高精度的任务,例如打乒乓球和打地鼠,而传统的同步推理在这些任务上会失败。代码可在 https://github.com/mit-han-lab/vlash 获取。