The expanding adoption of digital pathology has enabled the curation of large repositories of histology whole slide images (WSIs), which contain a wealth of information. Similar pathology image search offers the opportunity to comb through large historical repositories of gigapixel WSIs to identify cases with similar morphological features and can be particularly useful for diagnosing rare diseases, identifying similar cases for predicting prognosis, treatment outcomes, and potential clinical trial success. A critical challenge in developing a WSI search and retrieval system is scalability, which is uniquely challenging given the need to search a growing number of slides that each can consist of billions of pixels and are several gigabytes in size. Such systems are typically slow and retrieval speed often scales with the size of the repository they search through, making their clinical adoption tedious and are not feasible for repositories that are constantly growing. Here we present Fast Image Search for Histopathology (FISH), a histology image search pipeline that is infinitely scalable and achieves constant search speed that is independent of the image database size while being interpretable and without requiring detailed annotations. FISH uses self-supervised deep learning to encode meaningful representations from WSIs and a Van Emde Boas tree for fast search, followed by an uncertainty-based ranking algorithm to retrieve similar WSIs. We evaluated FISH on multiple tasks and datasets with over 22,000 patient cases spanning 56 disease subtypes. We additionally demonstrate that FISH can be used to assist with the diagnosis of rare cancer types where sufficient cases may not be available to train traditional supervised deep models. FISH is available as an easy-to-use, open-source software package (https://github.com/mahmoodlab/FISH).
翻译:越来越多地采用数字病理学,使大量含有丰富信息的精神学整体幻灯片图像储存库(SISI)得以整理出大量包含大量信息的信息。类似的病理图象搜索提供了一个机会,通过大型的GGAPixel WSI历史储存库进行梳理,以查明具有类似形态特征的案件,对于诊断罕见疾病特别有用,查明类似病例以预测预测预测病情、治疗结果和可能的临床试验成功。在开发世界科学研究所的癌症搜索和检索系统方面,一个严峻的挑战是可伸缩性,这是独特的挑战性,因为需要搜索越来越多的幻灯片,其中每个幻灯片都由数十亿像素组成,是几只大小的GGABAbyte。这类系统通常速度缓慢,而且检索速度通常与它们搜索的体积大小相仿,使它们的临床应用变得乏味,而且对于正在不断增长的储存库来说不可行。在这里我们展示了快速图像搜索(FSISISA)的快速图像搜索管道(FISA),它是一种无限可缩略图,并且能够实现与图像数据库独立的不断搜索速度,同时进行解释,不需要解释,而且不需要详细的搜索案例。