以关系为基础的视频中人类粒子估计的关联性联合地点 (Relation-Based Associative Joint Location for Human Pose Estimation in Videos)

Video-based human pose estimation (HPE) is a vital yet challenging task. While deep learning methods have made significant progress for the HPE, most approaches to this task detect each joint independently, damaging the pose structural information. In this paper, unlike the prior methods, we propose a Relation-based Pose Semantics Transfer Network (RPSTN) to locate joints associatively. Specifically, we design a lightweight joint relation extractor (JRE) to model the pose structural features and associatively generate heatmaps for joints by modeling the relation between any two joints heuristically instead of building each joint heatmap independently. Actually, the proposed JRE module models the spatial configuration of human poses through the relationship between any two joints. Moreover, considering the temporal semantic continuity of videos, the pose semantic information in the current frame is beneficial for guiding the location of joints in the next frame. Therefore, we use the idea of knowledge reuse to propagate the pose semantic information between consecutive frames. In this way, the proposed RPSTN captures temporal dynamics of poses. On the one hand, the JRE module can infer invisible joints according to the relationship between the invisible joints and other visible joints in space. On the other hand, in the time, the propose model can transfer the pose semantic features from the non-occluded frame to the occluded frame to locate occluded joints. Therefore, our method is robust to the occlusion and achieves state-of-the-art results on the two challenging datasets, which demonstrates its effectiveness for video-based human pose estimation. We will release the code and models publicly.

翻译：以视频为基础的人类表面估计( HPE ) 是一项至关重要但具有挑战性的任务。虽然深层次学习方法已经为 HPE 取得了显著的进展, 但大部分任务方法都独立检测了每个联合, 破坏了构成结构信息。在本文中, 与先前的方法不同, 我们提议建立一个基于关系基于 Pose 语义转换网络( RPSTN ), 以联合定位连接。具体地说, 我们设计了一个轻量联合关系提取器( JRE) 来模拟构成结构特征, 并联合生成热映像仪, 以模拟任何两个联合连接结构之间的关系, 而不是独立建立每个联合热映像仪。事实上, 拟议的 JRE 模块模型通过任何两个联合联合联合结构的关系来模拟人类的空间构造配置模式的空间配置模式。此外, 考虑到视频的时序连续性, 当前框架中的配置语义表达信息有助于在下一个框架中指导联合连接点的位置。因此, 我们使用知识再利用概念在连续的服务器框架之间传播构成的语义信息。以这种方式, 拟议的 RPSTNTN 将捕获时间动态动态动态的动态动态动态动态动态动态动态动态动态动态动态定位在另一个模型之间实现。。。在一手间,, 在共同的轨道中, 将不可隐隐隐隐隐隐隐隐隐的图像模型中, 将显示为共同结构中, 将演示中, 将显示共同结构中, 以隐隐隐隐隐隐隐隐性将显示。