Ultrasound echocardiography is essential for the non-invasive, real-time assessment of cardiac function, but the scarcity of labelled data, driven by privacy restrictions and the complexity of expert annotation, remains a major obstacle for deep learning methods. We propose the Motion Conditioned Diffusion Model (MCDM), a label-free latent diffusion framework that synthesises realistic echocardiography videos conditioned on self-supervised motion features. To extract these features, we design the Motion and Appearance Feature Extractor (MAFE), which disentangles motion and appearance representations from videos. Feature learning is further enhanced by two auxiliary objectives: a re-identification loss guided by pseudo appearance features and an optical flow loss guided by pseudo flow fields. Evaluated on the EchoNet-Dynamic dataset, MCDM achieves competitive video generation performance, producing temporally coherent and clinically realistic sequences without reliance on manual labels. These results demonstrate the potential of self-supervised conditioning for scalable echocardiography synthesis. Our code is available at https://github.com/ZheLi2020/LabelfreeMCDM.
翻译:超声心动图对于无创、实时评估心脏功能至关重要,但由于隐私限制和专家标注的复杂性,标注数据的稀缺性仍然是深度学习方法面临的主要障碍。我们提出了运动条件扩散模型(MCDM),这是一种无标签的潜在扩散框架,能够基于自监督运动特征合成逼真的超声心动图视频。为提取这些特征,我们设计了运动与外观特征提取器(MAFE),该模块从视频中解耦运动与外观表示。特征学习通过两个辅助目标进一步增强:由伪外观特征引导的重新识别损失和由伪光流场引导的光流损失。在EchoNet-Dynamic数据集上的评估表明,MCDM实现了具有竞争力的视频生成性能,无需依赖人工标注即可生成时间连贯且临床逼真的序列。这些结果证明了自监督条件机制在可扩展超声心动图合成中的潜力。我们的代码发布于https://github.com/ZheLi2020/LabelfreeMCDM。