Accurate fisheries data are crucial for effective and sustainable marine resource management. With the recent adoption of Electronic Monitoring (EM) systems, more video data is now being collected than can be feasibly reviewed manually. This paper addresses this challenge by developing an optimized deep learning pipeline for automated fish re-identification (Re-ID) using the novel AutoFish dataset, which simulates EM systems with conveyor belts with six similarly looking fish species. We demonstrate that key Re-ID metrics (R1 and mAP@k) are substantially improved by using hard triplet mining in conjunction with a custom image transformation pipeline that includes dataset-specific normalization. By employing these strategies, we demonstrate that the Vision Transformer-based Swin-T architecture consistently outperforms the Convolutional Neural Network-based ResNet-50, achieving peak performance of 41.65% mAP@k and 90.43% Rank-1 accuracy. An in-depth analysis reveals that the primary challenge is distinguishing visually similar individuals of the same species (Intra-species errors), where viewpoint inconsistency proves significantly more detrimental than partial occlusion. The source code and documentation are available at: https://github.com/msamdk/Fish_Re_Identification.git
翻译:准确的渔业数据对于有效且可持续的海洋资源管理至关重要。随着电子监控系统近年来的推广应用,当前采集的视频数据量已远超人工可审阅的范畴。本文通过开发一种优化的深度学习流程来解决这一挑战,该流程利用新型AutoFish数据集实现自动化的鱼类重识别。该数据集模拟了配备传送带的电子监控系统,包含六种外观相似的鱼类物种。研究表明,通过结合困难三元组挖掘与定制化的图像变换流程(包括数据集特定的归一化处理),关键重识别指标(R1和mAP@k)得到显著提升。采用这些策略后,基于视觉Transformer的Swin-T架构持续优于基于卷积神经网络的ResNet-50,最高达到41.65%的mAP@k与90.43%的Rank-1准确率。深入分析表明,主要挑战在于区分同一物种中视觉相似的个体(种内误差),其中视角不一致的影响显著大于局部遮挡。源代码及文档已发布于:https://github.com/msamdk/Fish_Re_Identification.git