Towards Scalable Deep 3D Perception and Generation

Abstract
Scaling up 3D deep learning systems emerges as a paramount issue, comprising two primary facets: (1) Model scalability that designs a 3D network that is scalefriendly, i.e. model archives improving performance with increasing parameters and can run efficiently. Unlike 2D convolutional networks, 3D networks have to accommodate the irregularities of 3D data, such as respecting permutation invariance in point clouds. (2) Data scalability: high-quality 3D data is conspicuously scarce in the 3D field. 3D data acquisition and annotations are both complex and costly, hampering the development of scalable 3D deep learning.

This dissertation delves into 3D deep learning including both perception and generation, addressing the scalability challenges. To address model scalability in 3D perception, I introduce ASSANet which outlines an approach for efficient 3D point cloud representation learning, allowing the model to scale up with a low cost of computation, and notably achieving substantial accuracy gains. I further introduce the PointNeXt framework, focusing on data augmentation and scalability of the architecture, that outperforms state-of-the-art 3D point cloud perception networks. To address data scalability, I present Pix4Point which explores the utilization of abundant 2D images to enhance 3D understanding. For scalable 3D generation, I propose Magic123 which leverages a joint 2D and 3D diffusion prior for zero-shot image-to-3D content generation without the necessity of 3D supervision. These collective efforts provide pivotal solutions to model and data scalability in 3D deep learning.

DOI
10.25781/KAUST-8H30E


Abstract
Scaling up 3D deep learning systems emerges as a paramount issue, comprising two primary facets: (1) Model scalability that designs a 3D network that is scalefriendly, i.e. model archives improving performance with increasing parameters and can run efficiently. Unlike 2D convolutional networks, 3D networks have to accommodate the irregularities of 3D data, such as respecting permutation invariance in point clouds. (2) Data scalability: high-quality 3D data is conspicuously scarce in the 3D field. 3D data acquisition and annotations are both complex and costly, hampering the development of scalable 3D deep learning.

This dissertation delves into 3D deep learning including both perception and generation, addressing the scalability challenges. To address model scalability in 3D perception, I introduce ASSANet which outlines an approach for efficient 3D point cloud representation learning, allowing the model to scale up with a low cost of computation, and notably achieving substantial accuracy gains. I further introduce the PointNeXt framework, focusing on data augmentation and scalability of the architecture, that outperforms state-of-the-art 3D point cloud perception networks. To address data scalability, I present Pix4Point which explores the utilization of abundant 2D images to enhance 3D understanding. For scalable 3D generation, I propose Magic123 which leverages a joint 2D and 3D diffusion prior for zero-shot image-to-3D content generation without the necessity of 3D supervision. These collective efforts provide pivotal solutions to model and data scalability in 3D deep learning.

DOI
10.25781/KAUST-8H30E

Permanent link to this record