Convolutional neural networks (CNNs) are usually built by stacking convolutional operations layer-by-layer. Although CNN has shown strong capability to extract semantics from raw pixels, its capacity to capture spatial relationships of pixels across rows and columns of an image is not fully explored. These relationships are important to learn semantic objects with strong shape priors but weak appearance coherences, such as traffic lanes, which are often occluded or not even painted on the road surface as shown in Fig. 1 (a). In this paper, we propose Spatial CNN (SCNN), which generalizes traditional deep layer-by-layer convolutions to slice-byslice convolutions within feature maps, thus enabling message passings between pixels across rows and columns in a layer. Such SCNN is particular suitable for long continuous shape structure or large objects, with strong spatial relationship but less appearance clues, such as traffic lanes, poles, and wall. We apply SCNN on a newly released very challenging traffic lane detection dataset and Cityscapse dataset. The results show that SCNN could learn the spatial relationship for structure output and significantly improves the performance. We show that SCNN outperforms the recurrent neural network (RNN) based ReNet and MRF+CNN (MRFNet) in the lane detection dataset by 8.7% and 4.6% respectively. Moreover, our SCNN won the 1st place on the TuSimple Benchmark Lane Detection Challenge, with an accuracy of 96.53%.
翻译:虽然CNN已经展示出从原始像素中提取精度的强大能力,但是它没有完全探索到它能够捕捉像素跨行和图像列之间的空间关系。这些关系对于学习形状前缀强但外观一致性弱的语义物体十分重要,例如交通通道,如Fig.96i. 1(a)中显示的交通通道往往隐蔽,甚至没有在路面上涂色。在本文中,我们提议空间CNN(SCNNN)将传统的深层前层前端前端前端前端前端前端前端前端前端前端变异成地标,从而使得像素跨行和图列之间的空间关系得以传递。这种SCNNN特别适合长的连续形状结构或大型物体,如交通通道、杆和墙等,但外观线索较少。我们用SSCNNNN与最近发布的非常具有挑战性的交通道探测数据集和市域网域域域网域数据设置。结果显示,SNNNW的S+ RNBR 的经常性数据结构可以大大改进。