3D Point Cloud Reconstruction Technique from 2D Image Using Efficient Feature Map Extraction Network
Keywords:
Point cloud, Feature map, Reconstruction, Reparameterization trick, Latent vector, Deep learningAbstract
In this paper, we proposed a 3D point cloud reconstruction technique from 2D images using an efficient feature map extraction network. The uniqueness of the method proposed in this paper is as follows. First, we used a new feature map extraction network that is about 27% more efficient than existing techniques in terms of memory. The proposed network did not downsize the input image until mid-way into the deeplearning network, so important information required for 3D point cloud reconstruction was preserved. The problem of increasing memory caused by using full-sized images is mitigated by efficiently configuring the deep learning network to be shallow. Second, by preserving the high-resolution features of the 2D image, the accuracy was further improved compared to the conventional technique. Feature maps extracted from the full-size image contained more detailed information than the existing method, thereby improving the accuracy of 3D point cloud reconstruction. Third, we used a divergence loss that did not require shooting information. Requiring information about not only 2D images but also the shooting angles in training can make dataset collection challenging. In this paper, the accuracy of the reconstruction of the 3D point cloud was increased by increasing the diversity of information through randomness without additional shooting information. In order to objectively evaluate the performance of the proposed method, experiments were conducted using the ShapeNet dataset following the same procedures as previous studies. The proposed method yielded a Chamfer distance (CD) of 5.87, an Earth mover’s distance (EMD) of 5.81, and FLOPs of 2.9G. Lower CD and EMD values indicate greater accuracy in 3D point cloud reconstruction, while a lower FLOPs value indicates reduced memory requirements for deep learning networks. Consequently, the CD, EMD, and FLOPs performance evaluation results of the proposed method demonstrated approximately 27% improvement in memory efficiency and approximately 6.3% improvement in accuracy compared to other methods, validating its objective performance.
References
Choy CB, Xu D, Gwak J, 2016, 3D-R2N2: A Unified Approach for Single and Multi-view 3D Object Reconstruction. Proceedings of the European Conference on Computer Vision, 628–644.
Mandikal P, Navaneet KL, Agarwal M, 2018, 3D-LMNet: Latent Embedding Matching for Accurate and Diverse 3D Point Cloud Reconstruction from a Single Image. Proceedings of the British Machine Vision Conference (BMVC). https://doi.org/10.48550/arXiv.1807.07796
Fan H, Hao S, Guibas L, 2016, A Point Set Generation Network for 3D Object Reconstruction from a Single Image. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 605–613. https://doi.org/10.48550/arXiv.1612.00603
Kingma DP, Welling M, 2014, Auto-Encoding Variational Bayes. Proceedings of the International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.1312.6114
Li B, Zhang Y, Zhao B, et al., 2020, A Single-View 3D-Object Point Cloud Reconstruction Network, IEEE Access, 8: 83782–83790. https://doi.org/10.1109/ACCESS.2020.2992554
He K, Zhang X, Ren S, et al., 2016, Deep Residual Learning for Image Recognition. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 770–778. https://doi.org/10.48550/arXiv.1512.03385
Higgins I, Matthey L, Pal A, et al., 2016, beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. Proceedings of the International Conference on Learning Representation.
Chang AX, Funkhouser T, Guibas L, et al., 2015, Shapenet: An Information-Rich 3D Model Repository. https://doi.org/10.48550/arXiv.1512.03012