RangeLVDet: Boosting 3D Object Detection in LIDAR With Range Image and RGB Image

Abstract

Camera and LIDAR are both important sensor modalities for real-world applications, especially autonomous driving. The sensors provide complementary information and make sensor fusion possible. However, the progress of early-fusion for both modalities is very slow due to the limitations of viewpoint misalignment, feature misalignment and data volume alignment, so that its performance is also very low. In this work, we propose a novel fusion pipeline: an early-fusion method of range image and RGB image to enhance 3D object detection. It takes full advantage of LIDAR’s range view, point view, bird’s eye view, and RGB view of the camera: First, it uses a large-scale 2D detection data set to pre-train a 2D detection network for extracting high-level semantic features from the RGB image; Second, it uses a specially designed 2D convolution network to extract high-level geometric features from the range image; Third, it fuses the semantic and geometric features through the point view of point clouds; Finally, the point view is transferred to bird’s eye view for 3D object detection. Both the range image and the RGB image are front views, so the multi-modal features are more matched. Experiments show that the proposed method has large improvements on the method of using only the point cloud, and outperforms the state-of-the-art methods on the self-built actual data set. In ablation, we study the dependence of the fusion method on 2D features and the effect of different fusion positions on the performance.

Publication
IEEE Sensors Journal

Related