We propose KnotGym, an interactive environment for complex, spatial reasoning and manipulation. KnotGym includes goal-oriented rope manipulation tasks with varying levels of complexity, all requiring acting from pure image observations. Tasks are defined along a clear and quantifiable axis of complexity based on the number of knot crossings, creating a natural generalization test. KnotGym has a simple observation space, allowing for scalable development, yet it highlights core challenges in integrating acute perception, spatial reasoning, and grounded manipulation. We evaluate methods of different classes, including model-based RL, model-predictive control, and chain-of-thought reasoning, and illustrate the challenges KnotGym presents. KnotGym is available at https://github.com/lil-lab/knotgym.
翻译:我们提出了KnotGym,一个用于复杂空间推理与操作的交互式环境。KnotGym包含一系列具有不同复杂度的目标导向绳索操作任务,所有任务均要求基于纯图像观测进行动作执行。任务根据绳结交叉点数量这一清晰可量化的复杂度指标进行定义,从而构建出自然的泛化能力测试基准。KnotGym具有简洁的观测空间,有利于可扩展性开发,同时突显了在整合敏锐感知、空间推理与具身操作过程中面临的核心挑战。我们评估了包括基于模型的强化学习、模型预测控制以及思维链推理在内的多种方法,并展示了KnotGym所带来的挑战。KnotGym已在https://github.com/lil-lab/knotgym开源发布。