Graph convolutional networks (GCNs) achieve promising performance for skeleton-based action recognition. However, in most GCN-based methods, the spatial-temporal graph convolution is strictly restricted by the graph topology while only captures the short-term temporal context, thus lacking the flexibility of feature extraction. In this work, we present a novel architecture, named Graph Convolutional skeleton Transformer (GCsT), which addresses limitations in GCNs by introducing Transformer. Our GCsT employs all the benefits of Transformer (i.e. dynamical attention and global context) while keeps the advantages of GCNs (i.e. hierarchy and local topology structure). In GCsT, the spatial-temporal GCN forces the capture of local dependencies while Transformer dynamically extracts global spatial-temporal relationships. Furthermore, the proposed GCsT shows stronger expressive capability by adding additional information present in skeleton sequences. Incorporating the Transformer allows that information to be introduced into the model almost effortlessly. We validate the proposed GCsT by conducting extensive experiments, which achieves the state-of-the-art performance on NTU RGB+D, NTU RGB+D 120 and Northwestern-UCLA datasets.
翻译:然而,在大多数以GCN为基础的方法中,空间-时钟图变迁受到图形表层学的严格限制,而只是捕捉短期时间背景,因此缺乏地貌提取的灵活性。在这项工作中,我们展示了一个新颖的架构,名为“石形变迁骨质变异器”(GCST),通过引入变压器解决GCN的局限性。我们的GCST利用了变压器的所有好处(即动态关注和全球背景),同时保持了GCN的优势(即等级和地方地形结构)。在GCST中,空间-时钟GCN迫使捕捉当地依赖性,而变压提取了全球空间-时序关系。此外,拟议的GCST通过添加骨质序列中的额外信息,显示了更强烈的表达能力。采用变压器,可以将所有信息引入模型(即动态关注和全球背景),同时保留GCDT的优势(即等级和地方地形结构结构)。在GCSTT(即空间-时间结构)中,空间-时钟GCN强制捕捉到本地依赖性关系,同时以动态提取全球空间-时序系关系。此外,拟议的GCSTTLA+RGB+NGB+NTU-D 和NTU-TU-TU-D)在NGB-DMTU-D 和NGB-D-D-TU-D-NTU-D-D-D-D-D-ND-ND-D-D-ND-D-D-ND-D-ND-ND-D-ND-D-D-D-D-ND-D-D-ND-ND-D-D-ND-ND-ND-TU-ND-ND-D-ND-D-D-D-D-D-D-ND-ND-D-D-D-D-D-D-D-ND-ND-D-D-D-ND-D-D-D-ND-D-ND-ND-ND-ND-D-D-D-D-D-D-D-D-ND-D-D-D-D-ND-D-D-D-D-D-D-D-