Finding a person across a camera network plays an important role in video surveillance. For a real-world person re-identification application, in order to guarantee an optimal time response, it is crucial to find the balance between accuracy and speed. We analyse this trade-off, comparing a classical method, that comprises hand-crafted feature description and metric learning, in particular, LOMO and XQDA, to deep learning based techniques, using image classification networks, ResNet and MobileNets. Additionally, we propose and analyse network distillation as a learning strategy to reduce the computational cost of the deep learning approach at test time. We evaluate both methods on the Market-1501 and DukeMTMC-reID large-scale datasets, showing that distillation helps reducing the computational cost at inference time while even increasing the accuracy performance.
翻译:通过照相机网络寻找一个人在视频监视中发挥着重要作用。对于真实世界的人重新识别应用程序来说,为了保证最佳的时间反应,关键是要找到准确性和速度之间的平衡。我们分析这一取舍,将传统方法,包括手工制作的特征描述和衡量学习,特别是LOMO和XQDA,与深层次学习技术进行比较,利用图像分类网络、ResNet和移动网络。此外,我们提议和分析网络蒸馏作为学习战略,以减少测试时深层学习方法的计算成本。我们评估了市场1501和DukMMMC重置大型数据集的两种方法,显示蒸馏有助于降低推论时间的计算成本,甚至提高精确性能。