Food image recognition is one of the promising applications of visual object recognition in computer vision. In this study, a small-scale dataset consisting of 5822 images of ten categories and a five-layer CNN was constructed to recognize these images. The bag-of-features (BoF) model coupled with support vector machine (SVM) was first evaluated for image classification, resulting in an overall accuracy of 56%; while the CNN model performed much better with an overall accuracy of 74%. Data augmentation techniques based on geometric transformation were applied to increase the size of training images, which achieved a significantly improved accuracy of more than 90% while preventing the overfitting issue that occurred to the CNN based on raw training data. Further improvements can be expected by collecting more images and optimizing the network architecture and hyper-parameters.
翻译:食品图像识别是计算机视觉视觉视觉物体识别的有希望的应用之一。 在这项研究中,为识别这些图像,建立了一个由5822个10类图像和5层CNN组成的小型数据集。首先对成套功能模型和辅助矢量机进行了图像分类评估,结果总体精确度达到56%;而CNN模型总体精确度达到74%,效果要好得多。 应用了基于几何转换的数据增强技术来增加培训图像的大小,从而大大提高了90%以上的精确度,同时防止了在原始培训数据基础上出现给CNN的超合适问题。通过收集更多的图像和优化网络结构和超参数,预计会进一步改进。