用于自动驾驶时3D立立空视频物体探测的时空通道变换器 (Temporal-Channel Transformer for 3D Lidar-Based Video Object Detection in Autonomous Driving)

The strong demand of autonomous driving in the industry has lead to strong interest in 3D object detection and resulted in many excellent 3D object detection algorithms. However, the vast majority of algorithms only model single-frame data, ignoring the temporal information of the sequence of data. In this work, we propose a new transformer, called Temporal-Channel Transformer, to model the spatial-temporal domain and channel domain relationships for video object detecting from Lidar data. As a special design of this transformer, the information encoded in the encoder is different from that in the decoder, i.e. the encoder encodes temporal-channel information of multiple frames while the decoder decodes the spatial-channel information for the current frame in a voxel-wise manner. Specifically, the temporal-channel encoder of the transformer is designed to encode the information of different channels and frames by utilizing the correlation among features from different channels and frames. On the other hand, the spatial decoder of the transformer will decode the information for each location of the current frame. Before conducting the object detection with detection head, the gate mechanism is deployed for re-calibrating the features of current frame, which filters out the object irrelevant information by repetitively refine the representation of target frame along with the up-sampling process. Experimental results show that we achieve the state-of-the-art performance in grid voxel-based 3D object detection on the nuScenes benchmark.

翻译：行业内自主驱动的强烈需求导致对3D天体探测的强烈兴趣,并产生了许多出色的3D天体探测算法。然而,绝大多数算法只模拟单一框架数据,忽略数据序列的时间信息。在这项工作中,我们提议了一个新的变压器,称为Temporal-Channel变异器,以模拟空间时空域和频道域关系,用于从利达尔数据中探测视频物体。作为这一变压器的特殊设计,编码器中的信息与解码器中的信息不同,即编码器编码为多框架的时道信息,而解码器则忽略了数据序列的时间范围信息。具体地说,变压器的时空通道编码器旨在利用不同频道和框架的特性之间的关联,对不同变码器编码的信息进行编码。另一方面,变码器的变码器将解码器中多个框架的每个位置的时道密码,即多框架的时道信息,同时解码器解码器解码器解码器用当前框架的时空通道信息,而当前框架的空道解码解码器解码器解码器用不规则标标标标标标,先进行比标标标标标标标标,然后进行比标标标标标标标标标标的镜,然后进行S 测试标标标标标标标标标的镜,在比的镜标的镜标的镜标标标的镜标的镜标的测试框架的测试框架上,然后进行S的变格测试标的变格测试标的变格测试标,在SB底的镜标,在SB底的镜框中,在SB框中,在比的镜框中进行SB的镜式测试式测试底的镜式测试框中,在比的镜式测试框中,在比上,在比上,在比上,在比上,在S的定位框上进行中进行中进行上,在比标框上,在比标框内的镜框上进行中进行中进行中进行中进行中进行中,在SB底的测试底的测试底的测试底的测试底的测试底的测试底的测试底的测试底的测试底的测试底的测试底的测试底的测试底的镜框上,在比,