3D Semantic Scene Graph Prediction aims to detect objects and their semantic relationships in 3D scenes, and has emerged as a crucial technology for robotics and AR/VR applications. While previous research has addressed dataset limitations and explored various approaches including Open-Vocabulary settings, they frequently fail to optimize the representational capacity of object and relationship features, showing excessive reliance on Graph Neural Networks despite insufficient discriminative capability. In this work, we demonstrate through extensive analysis that the quality of object features plays a critical role in determining overall scene graph accuracy. To address this challenge, we design a highly discriminative object feature encoder and employ a contrastive pretraining strategy that decouples object representation learning from the scene graph prediction. This design not only enhances object classification accuracy but also yields direct improvements in relationship prediction. Notably, when plugging in our pretrained encoder into existing frameworks, we observe substantial performance improvements across all evaluation metrics. Additionally, whereas existing approaches have not fully exploited the integration of relationship information, we effectively combine both geometric and semantic features to achieve superior relationship prediction. Comprehensive experiments on the 3DSSG dataset demonstrate that our approach significantly outperforms previous state-of-the-art methods. Our code is publicly available at https://github.com/VisualScienceLab-KHU/OCRL-3DSSG-Codes.
翻译:三维语义场景图预测旨在检测三维场景中的物体及其语义关系,已成为机器人与增强现实/虚拟现实应用中的关键技术。尽管先前研究已针对数据集局限性展开探讨,并探索了包括开放词汇表设置在内的多种方法,但这些方法往往未能优化物体与关系特征的表示能力,显示出对图神经网络过度依赖而判别能力不足的问题。在本研究中,我们通过深入分析证明:物体特征的质量对整体场景图预测精度具有决定性影响。为解决这一挑战,我们设计了一个高判别力的物体特征编码器,并采用对比预训练策略,将物体表征学习与场景图预测任务解耦。该设计不仅提升了物体分类精度,还直接改善了关系预测性能。值得注意的是,将我们预训练的编码器嵌入现有框架时,所有评估指标均观察到显著的性能提升。此外,针对现有方法未能充分利用关系信息融合的问题,我们通过有效结合几何特征与语义特征,实现了更优越的关系预测。在3DSSG数据集上的综合实验表明,本方法显著超越了现有最优方法。相关代码已公开于https://github.com/VisualScienceLab-KHU/OCRL-3DSSG-Codes。