ViDRiP-LLaVA：用于病理视频诊断推理的数据集与基准 (ViDRiP-LLaVA: A Dataset and Benchmark for Diagnostic Reasoning from Pathology Videos)

We present ViDRiP-LLaVA, the first large multimodal model (LMM) in computational pathology that integrates three distinct image scenarios, including single patch images, automatically segmented pathology video clips, and manually segmented pathology videos. This integration closely mirrors the natural diagnostic process of pathologists. By generating detailed histological descriptions and culminating in a definitive sign-out diagnosis, ViDRiP-LLaVA bridges visual narratives with diagnostic reasoning. Central to our approach is the ViDRiP-Instruct dataset, comprising 4278 video and diagnosis-specific chain-of-thought instructional pairs sourced from educational histopathology videos on YouTube. Although high-quality data is critical for enhancing diagnostic reasoning, its creation is time-intensive and limited in volume. To overcome this challenge, we transfer knowledge from existing single-image instruction datasets to train on weakly annotated, keyframe-extracted clips, followed by fine-tuning on manually segmented videos. ViDRiP-LLaVA establishes a new benchmark in pathology video analysis and offers a promising foundation for future AI systems that support clinical decision-making through integrated visual and diagnostic reasoning. Our code, data, and model are publicly available at: https://github.com/QuIIL/ViDRiP-LLaVA.

翻译：我们提出了ViDRiP-LLaVA，这是计算病理学领域首个集成三种不同图像场景的大型多模态模型（LMM），包括单张病理切片图像、自动分割的病理视频片段以及手动分割的病理视频。这种集成方式紧密模拟了病理学家的自然诊断过程。通过生成详细的组织学描述并最终形成明确的签出诊断，ViDRiP-LLaVA将视觉叙事与诊断推理联系起来。我们方法的核心是ViDRiP-Instruct数据集，该数据集包含4278个视频及诊断特定的思维链指令对，源自YouTube上的教育性组织病理学视频。尽管高质量数据对于提升诊断推理能力至关重要，但其创建过程耗时且数量有限。为克服这一挑战，我们从现有的单图像指令数据集中迁移知识，首先在弱标注、关键帧提取的视频片段上进行训练，随后在手动分割的视频上进行微调。ViDRiP-LLaVA为病理视频分析设立了新的基准，并为未来通过集成视觉与诊断推理来支持临床决策的人工智能系统奠定了有前景的基础。我们的代码、数据及模型已公开于：https://github.com/QuIIL/ViDRiP-LLaVA。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

36+阅读 · 2019年10月17日

Deep Learning Based Detection and Correction of Cardiac MR Motion Artefacts During Reconstruction for High-Quality Segmentation

专知会员服务

59+阅读 · 2019年10月17日