数据标注需求表示与规范（DARS） (A Data Annotation Requirements Representation and Specification (DARS))

With the rise of AI-enabled cyber-physical systems, data annotation has become a critical yet often overlooked process in the development of these intelligent information systems. Existing work in requirements engineering (RE) has explored how requirements for AI systems and their data can be represented. However, related interviews with industry professionals show that data annotations and their related requirements introduce distinct challenges, indicating a need for annotation-specific requirement representations. We propose the Data Annotation Requirements Representation and Specification (DARS), including an Annotation Negotiation Card to align stakeholders on objectives and constraints, and a Scenario-Based Annotation Specification to express atomic and verifiable data annotation requirements. We evaluate DARS with an automotive perception case related to an ongoing project, and a mapping against 18 real-world data annotation error types. The results suggest that DARS mitigates root causes of completeness, accuracy, and consistency annotation errors. By integrating DARS into RE, this work improves the reliability of safety-critical systems using data annotations and demonstrates how engineering frameworks must evolve for data-dependent components of today's intelligent information systems.

翻译：随着人工智能赋能的网络物理系统的兴起，数据标注已成为这些智能信息系统开发中至关重要却常被忽视的环节。现有需求工程研究已探讨了如何表示人工智能系统及其数据的需求。然而，对行业专业人士的相关访谈表明，数据标注及其相关需求带来了独特的挑战，这表明需要针对标注的特定需求表示方法。我们提出了数据标注需求表示与规范（DARS），包括用于协调利益相关者目标与约束的标注协商卡片，以及用于表达原子化且可验证的数据标注需求的基于场景的标注规范。我们通过一个与进行中项目相关的汽车感知案例，以及对18种真实世界数据标注错误类型的映射，对DARS进行了评估。结果表明，DARS能够缓解完整性、准确性和一致性标注错误的根本原因。通过将DARS整合到需求工程中，本研究提升了使用数据标注的安全关键系统的可靠性，并展示了工程框架应如何演进以适应当今智能信息系统中数据依赖组件的需求。