Recent progress in Sign Language Translation (SLT) has focussed primarily on improving the representational capacity of large language models to incorporate Sign Language features. This work explores an alternative direction: enhancing the geometric properties of skeletal representations themselves. We propose Geo-Sign, a method that leverages the properties of hyperbolic geometry to model the hierarchical structure inherent in sign language kinematics. By projecting skeletal features derived from Spatio-Temporal Graph Convolutional Networks (ST-GCNs) into the Poincar\'e ball model, we aim to create more discriminative embeddings, particularly for fine-grained motions like finger articulations. We introduce a hyperbolic projection layer, a weighted Fr\'echet mean aggregation scheme, and a geometric contrastive loss operating directly in hyperbolic space. These components are integrated into an end-to-end translation framework as a regularisation function, to enhance the representations within the language model. This work demonstrates the potential of hyperbolic geometry to improve skeletal representations for Sign Language Translation, improving on SOTA RGB methods while preserving privacy and improving computational efficiency. Code available here: https://github.com/ed-fish/geo-sign.
翻译:近期手语翻译(SLT)的进展主要集中于提升大语言模型的表征能力以融合手语特征。本研究探索了一个替代方向:增强骨骼表示本身的几何特性。我们提出了Geo-Sign方法,该方法利用双曲几何的特性来建模手语运动学中固有的层次结构。通过将源自时空图卷积网络(ST-GCNs)的骨骼特征投影到庞加莱球模型中,我们旨在创建更具区分性的嵌入表示,特别是针对手指关节等细粒度动作。我们引入了双曲投影层、加权弗雷歇均值聚合方案,以及直接在双曲空间中操作的几何对比损失函数。这些组件作为正则化函数集成到端到端翻译框架中,以增强语言模型内部的表征能力。本研究证明了双曲几何在改进手语翻译骨骼表示方面的潜力,在超越当前最佳RGB方法的同时,保护了隐私并提升了计算效率。代码发布于:https://github.com/ed-fish/geo-sign。