融合情境与社交维度的人类移动数据集 (Human Mobility Datasets Enriched With Contextual and Social Dimensions)

In this resource paper, we present two publicly available datasets of semantically enriched human trajectories, together with the pipeline to build them. The trajectories are publicly available GPS traces retrieved from OpenStreetMap. Each dataset includes contextual layers such as stops, moves, points of interest (POIs), inferred transportation modes, and weather data. A novel semantic feature is the inclusion of synthetic, realistic social media posts generated by Large Language Models (LLMs), enabling multimodal and semantic mobility analysis. The datasets are available in both tabular and Resource Description Framework (RDF) formats, supporting semantic reasoning and FAIR data practices. They cover two structurally distinct, large cities: Paris and New York. Our open source reproducible pipeline allows for dataset customization, while the datasets support research tasks such as behavior modeling, mobility prediction, knowledge graph construction, and LLM-based applications. To our knowledge, our resource is the first to combine real-world movement, structured semantic enrichment, LLM-generated text, and semantic web compatibility in a reusable framework.

翻译：本文作为资源论文，我们提出了两个公开可用的语义增强人类轨迹数据集，并提供了构建这些数据集的完整流程。轨迹数据来源于OpenStreetMap公开的GPS轨迹记录。每个数据集均包含多层情境信息，如停留点、移动段、兴趣点（POIs）、推断的交通方式以及天气数据。其中一项创新的语义特征是引入了由大语言模型（LLMs）生成的合成且真实的社交媒体帖子，从而支持多模态与语义化的移动性分析。数据集以表格形式和资源描述框架（RDF）格式提供，支持语义推理并遵循FAIR数据原则。数据集涵盖了两个结构迥异的大型城市：巴黎和纽约。我们开源的、可复现的流程支持数据集的定制化，同时这些数据集可用于行为建模、移动预测、知识图谱构建以及基于LLM的应用等研究任务。据我们所知，本资源首次在可复用的框架中整合了真实世界移动数据、结构化语义增强、LLM生成文本以及语义网兼容性。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

KG-BERT：基于BERT的知识图谱补全，KG-BERT: BERT for Knowledge Graph Completion

专知会员服务

195+阅读 · 2020年5月31日

【ACL2020-浙大-微软】多轮对话推理数据集，MuTual: A Dataset for Multi-Turn Dialogue Reasoning

专知会员服务

38+阅读 · 2020年4月10日

用于大型遥感影像检索的深度学习，Deep Learning for Image Search and Retrieval in Large Remote Sensing Archives

专知会员服务

39+阅读 · 2020年4月6日

【AAAI2020-Oral】自监督时空学习的视频完形程序，Video Cloze Procedure for Self-Supervised Spatio-Temporal Learning

专知会员服务

30+阅读 · 2020年1月2日