GASHSDB:用于计算机辅助分析气态癌症的新型气体病理学病理学图像数据集 (GasHisSDB: A New Gastric Histopathology Image Dataset for Computer Aided Diagnosis of Gastric Cancer)

Background and Objective: Gastric cancer has turned out to be the fifth most common cancer globally, and early detection of gastric cancer is essential to save lives. Histopathological examination of gastric cancer is the gold standard for the diagnosis of gastric cancer. However, computer-aided diagnostic techniques are challenging to evaluate due to the scarcity of publicly available gastric histopathology image datasets. Methods: In this paper, a noble publicly available Gastric Histopathology Sub-size Image Database (GasHisSDB) is published to identify classifiers' performance. Specifically, two types of data are included: normal and abnormal, with a total of 245,196 tissue case images. In order to prove that the methods of different periods in the field of image classification have discrepancies on GasHisSDB, we select a variety of classifiers for evaluation. Seven classical machine learning classifiers, three Convolutional Neural Network classifiers, and a novel transformer-based classifier are selected for testing on image classification tasks. Results: This study performed extensive experiments using traditional machine learning and deep learning methods to prove that the methods of different periods have discrepancies on GasHisSDB. Traditional machine learning achieved the best accuracy rate of 86.08% and a minimum of just 41.12%. The best accuracy of deep learning reached 96.47% and the lowest was 86.21%. Accuracy rates vary significantly across classifiers. Conclusions: To the best of our knowledge, it is the first publicly available gastric cancer histopathology dataset containing a large number of images for weakly supervised learning. We believe that GasHisSDB can attract researchers to explore new algorithms for the automated diagnosis of gastric cancer, which can help physicians and patients in the clinical setting.

翻译：86. 背景和目标:胃癌已成为全球第五大常见癌症,对胃癌的早期检测对于拯救生命至关重要。胃癌的病理学检查是诊断胃癌的黄金标准。然而,计算机辅助诊断技术具有挑战性,因为缺少公开提供的胃病理学图像数据集。方法:在本论文中,公布了一个高尚的公开公开的胃病理病理学亚规模图像数据库(GashisSDB),以确定分类员的性能。具体而言,包括两类数据:正常和异常,共有245,196个组织癌症病例。为了证明不同时期的图像分类方法在GasHisSDB中存在差异,我们选择了多种分类师来进行评估。七位古典机器学习分类师,三位神经网络分类师,以及一个新的基于变异体的分类师来测试图像分类任务。结果:本研究利用传统机器学习和深度学习方法进行了广泛的实验,以证明不同时期的深度诊断方法在服务器上存在差异。传统机理学数据库中, 最起码的精确度为:最精确度为16 % 。传统机算数据库中,最精确度的精确度为最精确度,最精确的精确度为最精确度是最精确度,最精确度,最精确度为最精确度。