LidarCLIP 或: 如何学会如何与点云对话</s> (LidarCLIP or: How I Learned to Talk to Point Clouds)

Research connecting text and images has recently seen several breakthroughs, with models like CLIP, DALL-E 2, and Stable Diffusion. However, the connection between text and other visual modalities, such as lidar data, has received less attention, prohibited by the lack of text-lidar datasets. In this work, we propose LidarCLIP, a mapping from automotive point clouds to a pre-existing CLIP embedding space. Using image-lidar pairs, we supervise a point cloud encoder with the image CLIP embeddings, effectively relating text and lidar data with the image domain as an intermediary. We show the effectiveness of LidarCLIP by demonstrating that lidar-based retrieval is generally on par with image-based retrieval, but with complementary strengths and weaknesses. By combining image and lidar features, we improve upon both single-modality methods and enable a targeted search for challenging detection scenarios under adverse sensor conditions. We also explore zero-shot classification and show that LidarCLIP outperforms existing attempts to use CLIP for point clouds by a large margin. Finally, we leverage our compatibility with CLIP to explore a range of applications, such as point cloud captioning and lidar-to-image generation, without any additional training. Code and pre-trained models are available at https://github.com/atonderski/lidarclip.

翻译：将文字和图像连接起来的研究最近取得了一些突破,例如CLIP、DALL-E 2和稳定传播等模型。然而,文本与Lidar数据等其他视觉模式之间的联系受到较少的关注,因为缺少文本-lidar数据集而被禁止。在这项工作中,我们提议LidarCLIP,从汽车点云到原有的CLIP嵌入空间的绘图。我们使用图像-lidar对相框,监督一个点云码,图像CLIP嵌入,有效地将文本和Lidar数据与图像域作为中间线连接。我们通过显示基于LidarCLCPIP的检索通常与基于图像的检索相同,但具有互补的优势和弱点,来显示Lidar CLLIP的有效性。我们通过将图像和Lidartal功能结合起来,改进单一模式的方法,并能够有针对性地搜索在不利传感器条件下具有挑战性的探测情景。我们还探索零光分分类,并显示Ldar CLIP比目前试图在大边缘使用CLIP进行点云的尝试。我们利用CLIP的功能搜索范围。最后与CLLLIIP的CLIP 。我们在CLIBDARDIP 和FDIP上探索了生成模型。我们可以使用的额外应用范围探索。</s>

相关内容

点云

关注 48

根据激光测量原理得到的点云，包括三维坐标（XYZ）和激光反射强度（Intensity）。根据摄影测量原理得到的点云，包括三维坐标（XYZ）和颜色信息（RGB）。结合激光测量和摄影测量原理得到点云，包括三维坐标（XYZ）、激光反射强度（Intensity）和颜色信息（RGB）。在获取物体表面每个采样点的空间坐标后，得到的是一个点的集合，称之为“点云”(Point Cloud)

最新《Transformers模型》教程，64页ppt

专知会员服务

323+阅读 · 2020年11月26日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

81+阅读 · 2020年7月26日

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日