自动驾驶车辆测试中多传感器数据集的半自动标注方法 (Semi-Automated Data Annotation in Multisensor Datasets for Autonomous Vehicle Testing)

This report presents the design and implementation of a semi-automated data annotation pipeline developed within the DARTS project, whose goal is to create a large-scale, multimodal dataset of driving scenarios recorded in Polish conditions. Manual annotation of such heterogeneous data is both costly and time-consuming. To address this challenge, the proposed solution adopts a human-in-the-loop approach that combines artificial intelligence with human expertise to reduce annotation cost and duration. The system automatically generates initial annotations, enables iterative model retraining, and incorporates data anonymization and domain adaptation techniques. At its core, the tool relies on 3D object detection algorithms to produce preliminary annotations. Overall, the developed tools and methodology result in substantial time savings while ensuring consistent, high-quality annotations across different sensor modalities. The solution directly supports the DARTS project by accelerating the preparation of large annotated dataset in the project's standardized format, strengthening the technological base for autonomous vehicle research in Poland.

翻译：本报告介绍了DARTS项目内开发的半自动数据标注流程的设计与实现。该项目旨在创建大规模、多模态的驾驶场景数据集，数据采集于波兰实际道路环境。对此类异构数据进行人工标注成本高昂且耗时。为解决这一挑战，所提出的方案采用人机协同方法，将人工智能与人类专业知识相结合，以降低标注成本与周期。该系统能够自动生成初始标注，支持迭代式模型重训练，并整合了数据匿名化与领域自适应技术。该工具的核心依赖于3D目标检测算法来生成初步标注。总体而言，所开发的工具与方法在保证跨传感器模态标注一致性与高质量的同时，实现了显著的时间节省。该方案通过加速符合项目标准格式的大规模标注数据集制备，直接支持DARTS项目，为波兰自动驾驶车辆研究奠定了坚实的技术基础。