Background: Artificial intelligence (AI) is improving the efficiency and accuracy of cancer diagnostics. The performance of pathology AI systems has been almost exclusively evaluated on European and US cohorts from large centers. For global AI adoption in pathology, validation studies on currently under-represented populations - where the potential gains from AI support may also be greatest - are needed. We present the first study with an external validation cohort from the Middle East, focusing on AI-based diagnosis and Gleason grading of prostate cancer. Methods: We collected and digitised 339 prostate biopsy specimens from the Kurdistan region, Iraq, representing a consecutive series of 185 patients spanning the period 2013-2024. We evaluated a task-specific end-to-end AI model and two foundation models in terms of their concordance with pathologists and consistency across samples digitised on three scanner models (Hamamatsu, Leica, and Grundium). Findings: Grading concordance between AI and pathologists was similar to pathologist-pathologist concordance with Cohen's quadratically weighted kappa 0.801 vs. 0.799 (p=0.9824). Cross-scanner concordance was high (quadratically weighted kappa > 0.90) for all AI models and scanner pairs, including low-cost compact scanner. Interpretation: AI models demonstrated pathologist-level performance in prostate histopathology assessment. Compact scanners can provide a route for validation studies in non-digitalised settings and enable cost-effective adoption of AI in laboratories with limited sample volumes. This first openly available digital pathology dataset from the Middle East supports further research into globally equitable AI pathology. Funding: SciLifeLab and Wallenberg Data Driven Life Science Program, Instrumentarium Science Foundation, Karolinska Institutet Research Foundation.
翻译:背景:人工智能(AI)正在提高癌症诊断的效率和准确性。病理学AI系统的性能评估几乎完全基于欧美大型中心的队列数据。为了在全球病理学领域推广AI,需要对当前代表性不足的人群进行验证研究——这些人群从AI辅助中获得的潜在收益可能也最大。我们首次提出了一个来自中东地区的外部验证队列研究,重点关注基于AI的前列腺癌诊断和格里森分级。方法:我们收集并数字化了来自伊拉克库尔德斯坦地区的339份前列腺活检样本,这些样本代表了2013年至2024年期间连续收集的185例患者系列。我们评估了一个任务特定的端到端AI模型和两个基础模型,评估指标包括它们与病理学家的一致性,以及在三种扫描仪型号(Hamamatsu、Leica和Grundium)上数字化样本的一致性。结果:AI与病理学家之间的分级一致性类似于病理学家之间的一致性,Cohen二次加权kappa系数分别为0.801 vs. 0.799(p=0.9824)。对于所有AI模型和扫描仪配对(包括低成本紧凑型扫描仪),跨扫描仪一致性均很高(二次加权kappa > 0.90)。解读:AI模型在前列腺组织病理学评估中表现出病理学家水平的性能。紧凑型扫描仪可以为非数字化环境下的验证研究提供途径,并使样本量有限的实验室能够经济高效地采用AI。这是首个公开可用的中东数字病理学数据集,支持进一步研究全球公平的AI病理学。资助:SciLifeLab和瓦伦堡数据驱动生命科学计划、Instrumentarium科学基金会、卡罗林斯卡学院研究基金会。