The advancement of artificial intelligence in oral healthcare relies on the availability of large-scale multimodal datasets that capture the complexity of clinical practice. In this paper, we present a comprehensive multimodal dataset, comprising 8775 dental checkups from 4800 patients collected over eight years (2018-2025), with patients ranging from 10 to 90 years of age. The dataset includes 50000 intraoral images, 8056 radiographs, and detailed textual records, including diagnoses, treatment plans, and follow-up notes. The data were collected under standard ethical guidelines and annotated for benchmarking. To demonstrate its utility, we fine-tuned state-of-the-art large vision-language models, Qwen-VL 3B and 7B, and evaluated them on two tasks: classification of six oro-dental anomalies and generation of complete diagnostic reports from multimodal inputs. We compared the fine-tuned models with their base counterparts and GPT-4o. The fine-tuned models achieved substantial gains over these baselines, validating the dataset and underscoring its effectiveness in advancing AI-driven oro-dental healthcare solutions. The dataset is publicly available, providing an essential resource for future research in AI dentistry.
翻译:人工智能在口腔医疗领域的发展依赖于能够捕捉临床实践复杂性的大规模多模态数据集。本文提出了一个综合性的多模态数据集,包含在八年(2018-2025)间从4800名患者收集的8775次牙科检查记录,患者年龄范围为10至90岁。该数据集包含50000张口内图像、8056张放射影像以及详细的文本记录,涵盖诊断、治疗计划和随访笔记。数据收集遵循标准伦理准则,并已标注用于基准测试。为展示其应用价值,我们对先进的大型视觉语言模型Qwen-VL 3B和7B进行了微调,并在两项任务上进行了评估:六类口腔牙科异常的分类,以及基于多模态输入生成完整诊断报告。我们将微调后的模型与其基础版本及GPT-4o进行了比较。微调模型相较于这些基线取得了显著提升,验证了数据集的有效性,并突显了其在推动人工智能驱动的口腔牙科医疗解决方案方面的重要作用。该数据集已公开提供,为未来人工智能牙科研究提供了关键资源。