Chest X-ray images are commonly used in medical diagnosis, and AI models have been developed to assist with the interpretation of these images. However, many of these models rely on information from a single view of the X-ray, while multiple views may be available. In this work, we propose a novel approach for combining information from multiple views to improve the performance of X-ray image classification. Our approach is based on the use of a convolutional neural network to extract feature maps from each view, followed by an attention mechanism implemented using a Vision Transformer. The resulting model is able to perform multi-label classification on 41 labels and outperforms both single-view models and traditional multi-view classification architectures. We demonstrate the effectiveness of our approach through experiments on a dataset of 363,000 X-ray images.
翻译:切片X光图像通常用于医学诊断,并开发了人工智能模型,以协助这些图像的判读;然而,许多这些模型依赖从一个单一的X光视角获得的信息,而可能有多种观点;在这项工作中,我们提出一种新颖的办法,将多种观点的信息结合起来,以提高X光图像分类的性能;我们的方法是利用进化神经网络从每种观点中提取地貌地图,然后利用视野变异器实施关注机制;由此形成的模型能够对41个标签进行多标签分类,并超越单视模型和传统的多视分类结构;我们通过对363 000个X光图像数据集的实验,展示了我们的方法的有效性。