TAP-Net:利用强化学习的运输和包 (TAP-Net: Transport-and-Pack using Reinforcement Learning)

We introduce the transport-and-pack(TAP) problem, a frequently encountered instance of real-world packing, and develop a neural optimization solution based on reinforcement learning. Given an initial spatial configuration of boxes, we seek an efficient method to iteratively transport and pack the boxes compactly into a target container. Due to obstruction and accessibility constraints, our problem has to add a new search dimension, i.e., finding an optimal transport sequence, to the already immense search space for packing alone. Using a learning-based approach, a trained network can learn and encode solution patterns to guide the solution of new problem instances instead of executing an expensive online search. In our work, we represent the transport constraints using a precedence graph and train a neural network, coined TAP-Net, using reinforcement learning to reward efficient and stable packing. The network is built on an encoder-decoder architecture, where the encoder employs convolution layers to encode the box geometry and precedence graph and the decoder is a recurrent neural network (RNN) which inputs the current encoder output, as well as the current box packing state of the target container, and outputs the next box to pack, as well as its orientation. We train our network on randomly generated initial box configurations, without supervision, via policy gradients to learn optimal TAP policies to maximize packing efficiency and stability. We demonstrate the performance of TAP-Net on a variety of examples, evaluating the network through ablation studies and comparisons to baselines and alternative network designs. We also show that our network generalizes well to larger problem instances, when trained on small-sized inputs.

翻译：我们引入了运输和包装(TAP)问题,这是现实世界包装中经常遇到的一个实例,并基于强化学习开发了神经优化解决方案。根据最初的空间配置,我们寻求一种高效的方法来迭代运输,并将盒子捆绑到一个目标容器中。由于阻碍和无障碍的限制,我们的问题必须增加一个新的搜索层面,即找到一个最佳的运输序列,到一个已经巨大的包装搜索空间。使用基于学习的方法,训练有素的网络可以学习和编码解决方案模式,以指导新问题的解决,而不是进行昂贵的在线搜索。在我们的工作中,我们还利用一个超前图形来代表运输方面的制约因素,并训练一个神经网络网络,同时利用强化学习来奖励高效和稳定的包装。网络必须建立在一个编码-解码结构上,即找到一个最佳的运输序列序列,而解码是一个经常的神经网络(RNNU),用来将目前的替代编码输出输入到一个成本昂贵的在线搜索中,以及目前用来包装的纸质输入一个目标网络的神经输入状态,通过TAP-Net网络的初始性定位模型显示我们最优化的网络的网络,通过一个最优化的容器的升级的网络,然后显示一个我们最精确的网络的容器的容器,然后显示一个我们最精确的容器的容器的容器和最精确的容器的容器的规格的规格的规格的规格的网络, 显示。

相关内容

Networking

关注 22

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html

因果图，Causal Graphs，52页ppt

专知会员服务

238+阅读 · 2020年4月19日

【牛津大学】深度残差强化学习，Deep Residual Reinforcement Learning

专知会员服务

80+阅读 · 2020年2月18日

深度强化学习策略梯度教程，53页ppt

专知会员服务

176+阅读 · 2020年2月1日

【强化学习资源集合】Awesome Reinforcement Learning

专知会员服务

93+阅读 · 2019年12月23日