Automatic License Plate Recognition (ALPR) faces a major challenge when dealing with illegible license plates (LPs). While reconstruction methods such as super-resolution (SR) have emerged, the core issue of recognizing these low-quality LPs remains unresolved. To optimize model performance and computational efficiency, image pre-processing should be applied selectively to cases that require enhanced legibility. To support research in this area, we introduce a novel dataset comprising 10,210 images of vehicles with 12,687 annotated LPs for legibility classification (the LPLC dataset). The images span a wide range of vehicle types, lighting conditions, and camera/image quality levels. We adopt a fine-grained annotation strategy that includes vehicle- and LP-level occlusions, four legibility categories (perfect, good, poor, and illegible), and character labels for three categories (excluding illegible LPs). As a benchmark, we propose a classification task using three image recognition networks to determine whether an LP image is good enough, requires super-resolution, or is completely unrecoverable. The overall F1 score, which remained below 80% for all three baseline models (ViT, ResNet, and YOLO), together with the analyses of SR and LP recognition methods, highlights the difficulty of the task and reinforces the need for further research. The proposed dataset is publicly available at https://github.com/lmlwojcik/lplc-dataset.
翻译:自动车牌识别(ALPR)在处理不可读车牌(LP)时面临重大挑战。尽管超分辨率(SR)等重建方法已经出现,但识别这些低质量车牌的核心问题仍未解决。为优化模型性能和计算效率,图像预处理应选择性地应用于需要增强可读性的情况。为支持该领域的研究,我们引入了一个新颖的数据集,包含10,210张车辆图像,标注了12,687个车牌用于可读性分类(即LPLC数据集)。这些图像涵盖了广泛的车辆类型、光照条件和相机/图像质量水平。我们采用细粒度标注策略,包括车辆和车牌级别的遮挡、四个可读性类别(完美、良好、较差和不可读),以及三个类别(不包括不可读车牌)的字符标签。作为基准,我们提出一项分类任务,使用三种图像识别网络来确定车牌图像是否足够好、需要超分辨率处理,或完全无法恢复。所有三个基线模型(ViT、ResNet和YOLO)的总体F1分数均低于80%,结合对SR和车牌识别方法的分析,突显了该任务的难度,并强调了进一步研究的必要性。所提出的数据集已在https://github.com/lmlwojcik/lplc-dataset公开提供。