Text-based delivery addresses, as the data foundation for logistics systems, contain abundant and crucial location information. How to effectively encode the delivery address is a core task to boost the performance of downstream tasks in the logistics system. Pre-trained Models (PTMs) designed for Natural Language Process (NLP) have emerged as the dominant tools for encoding semantic information in text. Though promising, those NLP-based PTMs fall short of encoding geographic knowledge in the delivery address, which considerably trims down the performance of delivery-related tasks in logistic systems such as Cainiao. To tackle the above problem, we propose a domain-specific pre-trained model, named G2PTL, a Geography-Graph Pre-trained model for delivery address in Logistics field. G2PTL combines the semantic learning capabilities of text pre-training with the geographical-relationship encoding abilities of graph modeling. Specifically, we first utilize real-world logistics delivery data to construct a large-scale heterogeneous graph of delivery addresses, which contains abundant geographic knowledge and delivery information. Then, G2PTL is pre-trained with subgraphs sampled from the heterogeneous graph. Comprehensive experiments are conducted to demonstrate the effectiveness of G2PTL through four downstream tasks in logistics systems on real-world datasets. G2PTL has been deployed in production in Cainiao's logistics system, which significantly improves the performance of delivery-related tasks.
翻译:快递地址作为物流系统的数据基础,包含丰富而关键的位置信息。如何有效地编码快递地址是提高物流系统下游任务性能的核心任务。自然语言处理(NLP)的预训练模型(PTMs)已经成为编码文本语义信息的主要工具。虽然有望,但那些NLP PTMs缺乏编码快递地址中的地理知识的能力,这极大地削减了物流系统如菜鸟等与快递相关任务的性能。为了解决上述问题,我们提出了一种面向物流领域的预训练模型,名为G2PTL,它是一个用于快递地址的地理-图形预训练模型。G2PTL结合了文本预训练的语义学习能力和图模型的地理-关系编码能力。具体来说,我们首先利用实际物流交付数据构建了一个包含丰富的地理知识和交付信息的大规模异构图形的交付地址。然后,使用从异构图形中采样出来的子图来预训练G2PTL。我们进行了全面的实验,通过物流系统上的四项下游任务来证明G2PTL的有效性,并使用真实数据集在Cainiao的物流系统中部署了G2PTL。G2PTL能够显著提高与交付相关的任务性能。