生物医学图像与附带损失的分离 (Compound Figure Separation of Biomedical Images with Side Loss)

Tianyuan Yao,Chang Qu,Quan Liu,Ruining Deng,Yuanhan Tian,Jiachen Xu,Aadarsh Jha,Shunxing Bao,Mengyang Zhao,Agnes B. Fogo,Bennett A. Landman,Catie Chang,Haichun Yang,Yuankai Huo

Unsupervised learning algorithms (e.g., self-supervised learning, auto-encoder, contrastive learning) allow deep learning models to learn effective image representations from large-scale unlabeled data. In medical image analysis, even unannotated data can be difficult to obtain for individual labs. Fortunately, national-level efforts have been made to provide efficient access to obtain biomedical image data from previous scientific publications. For instance, NIH has launched the Open-i search engine that provides a large-scale image database with free access. However, the images in scientific publications consist of a considerable amount of compound figures with subplots. To extract and curate individual subplots, many different compound figure separation approaches have been developed, especially with the recent advances in deep learning. However, previous approaches typically required resource extensive bounding box annotation to train detection models. In this paper, we propose a simple compound figure separation (SimCFS) framework that uses weak classification annotations from individual images. Our technical contribution is three-fold: (1) we introduce a new side loss that is designed for compound figure separation; (2) we introduce an intra-class image augmentation method to simulate hard cases; (3) the proposed framework enables an efficient deployment to new classes of images, without requiring resource extensive bounding box annotations. From the results, the SimCFS achieved a new state-of-the-art performance on the ImageCLEF 2016 Compound Figure Separation Database. The source code of SimCFS is made publicly available at https://github.com/hrlblab/ImageSeperation.

翻译：不受监督的学习算法(例如,自我监督的学习、自动编码、对比式学习)使得深层次学习模式能够从大型无标签数据中学习到有效的图像显示。在医学图像分析中,即使没有附加说明的数据也很难为单个实验室获得。幸运的是,国家一级努力提供高效率的接入,从以前的科学出版物中获取生物医学图像数据。例如,NIH启动了开放搜索引擎,为个人图像提供了大范围的图像数据库。然而,科学出版物中的图像包含大量带有子笔的复合数字。为了提取和整理单个子笔,已经开发了许多不同的复合图解分离方法,特别是最近深层次学习的进展。然而,以往的方法通常需要资源广泛的捆绑框,以培训检测模型。在本文件中,我们提议一个简单的复合图分离框架,使用个人图像的薄弱分类说明。我们的技术贡献有三重:(1) 我们引入了用于复合图解的新的侧损失;(2) 我们从内部的分类/ C型图像分类,我们引入了一种不要求内部的直径图像分析的硬度图像分析模型。我们引入了一种内部的硬度的硬度的图像分析模型分析方法。

相关内容