价值函数是所有你需要的: 自行车欢呼平台的统一学习框架 (Value Function is All You Need: A Unified Learning Framework for Ride Hailing Platforms)

Large ride-hailing platforms, such as DiDi, Uber and Lyft, connect tens of thousands of vehicles in a city to millions of ride demands throughout the day, providing great promises for improving transportation efficiency through the tasks of order dispatching and vehicle repositioning. Existing studies, however, usually consider the two tasks in simplified settings that hardly address the complex interactions between the two, the real-time fluctuations between supply and demand, and the necessary coordinations due to the large-scale nature of the problem. In this paper we propose a unified value-based dynamic learning framework (V1D3) for tackling both tasks. At the center of the framework is a globally shared value function that is updated continuously using online experiences generated from real-time platform transactions. To improve the sample-efficiency and the robustness, we further propose a novel periodic ensemble method combining the fast online learning with a large-scale offline training scheme that leverages the abundant historical driver trajectory data. This allows the proposed framework to adapt quickly to the highly dynamic environment, to generalize robustly to recurrent patterns and to drive implicit coordinations among the population of managed vehicles. Extensive experiments based on real-world datasets show considerably improvements over other recently proposed methods on both tasks. Particularly, V1D3 outperforms the first prize winners of both dispatching and repositioning tracks in the KDD Cup 2020 RL competition, achieving state-of-the-art results on improving both total driver income and user experience related metrics.

翻译：大型乘车平台,如Didi、Uber和Lyft等,将城市数以万计的车辆连通到数以百万计的乘车需求,通过实时平台交易和车辆重新定位的任务,为提高运输效率做出了巨大的承诺。然而,现有的研究通常会考虑简化环境中的两种任务,这些任务几乎无法解决两者之间复杂的相互作用,即供需之间的实时波动,以及由于问题规模庞大而需要进行的协调。在本文件中,我们提议了一个统一的基于价值的动态动态学习框架(V1D3)来应对这两个任务。在框架的中心,是一个全球共享的价值观功能,它不断利用实时平台交易产生的在线经验不断更新。为了提高抽样效率和稳健性,我们进一步提出一个新的定期综合方法,将快速在线学习与大型离线培训计划相结合,利用丰富的历史驱动数据轨迹数据。这使得拟议框架能够迅速适应高度动态环境,普遍地适应经常性模式,并促使管理车辆的人群进行隐含的协调。基于实时平台交易产生的在线经验不断更新。基于真实性平台交易的大规模实验,在真实性动态数据库上,在最新版本3号数据库上大幅改进了其他流程上,从而改进了其他格式更新了VDDR-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-R-S-S-S-S-S-S-S-S-S-S-S-S-S-R-R-S-R-R-R-S-S-S-S-S-R-R-R-R-R-R-R-R-R-R-S-S-R-S-S-R-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-S-