In this work, we estimate the intrinsic dimension (iD) of the Radio Galaxy Zoo (RGZ) dataset using a score-based diffusion model. We examine how the iD estimates vary as a function of Bayesian neural network (BNN) energy scores, which measure how similar the radio sources are to the MiraBest subset of the RGZ dataset. We find that out-of-distribution sources exhibit higher iD values, and that the overall iD for RGZ exceeds those typically reported for natural image datasets. Furthermore, we analyse how iD varies across Fanaroff-Riley (FR) morphological classes and as a function of the signal-to-noise ratio (SNR). While no relationship is found between FR I and FR II classes, a weak trend toward higher SNR at lower iD. Future work using the RGZ dataset could make use of the relationship between iD and energy scores to quantitatively study and improve the representations learned by various self-supervised learning algorithms.
翻译:本研究利用基于分数的扩散模型估计了Radio Galaxy Zoo(RGZ)数据集的本征维度(iD)。我们探究了iD估计值如何随贝叶斯神经网络(BNN)能量分数的变化而变化,该能量分数用于衡量射电源与RGZ数据集中MiraBest子集的相似程度。研究发现,分布外源表现出更高的iD值,且RGZ的整体iD超过通常报道的自然图像数据集的iD值。此外,我们分析了iD在Fanaroff-Riley(FR)形态分类中的变化情况及其与信噪比(SNR)的函数关系。虽然FR I与FR II类别之间未发现关联,但观察到在较低iD时SNR呈现微弱升高趋势。未来利用RGZ数据集的研究可基于iD与能量分数的关系,定量研究并改进各类自监督学习算法习得的表征。