• Login
    View Item 
    •   Home
    • Theses and Dissertations
    • PhD Dissertations
    • View Item
    •   Home
    • Theses and Dissertations
    • PhD Dissertations
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Browse

    All of KAUSTCommunitiesIssue DateSubmit DateThis CollectionIssue DateSubmit Date

    My Account

    Login

    Quick Links

    Open Access PolicyORCID LibguideTheses and Dissertations LibguideSubmit an Item

    Statistics

    Display statistics

    Extracting Semantic and Geometric Information in Images and Videos using GANs

    • CSV
    • RefMan
    • EndNote
    • BibTex
    • RefWorks
    Thumbnail
    Name:
    Rameen_thesis_2023 (1).pdf
    Size:
    484.4Mb
    Format:
    PDF
    Description:
    PhD Dissertation
    Download
    Type
    Dissertation
    Authors
    Abdal, Rameen cc
    Advisors
    Wonka, Peter cc
    Committee members
    Ghanem, Bernard cc
    Hadwiger, Markus cc
    Huang, Jia-Bin
    Program
    Computer Science
    KAUST Department
    Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division
    Date
    2023-03
    Permanent link to this record
    http://hdl.handle.net/10754/690240
    
    Metadata
    Show full item record
    Abstract
    The success of Generative Adversarial Networks (GANs) has resulted in unprecedented quality both for image generation and manipulation. Recent state-of-the-art GANs (e.g., the StyleGAN series) have demonstrated outstanding results in photo-realistic image generation. In this dissertation, we explore the latent space properties, including image manipulation, extraction of 3D properties, and performing various weakly supervised and unsupervised downstream tasks using StyleGAN and its derivative architectures. First, we study the images' projection into StyleGAN's latent space and analyze the properties of embedded images in a proposed extended $W+$ latent space. Second, we demonstrate rich semantic interpretations of the images in the latent space, which indirectly creates a compelling semantic understanding of the underlying latent space. Specifically, we combine $W+$ space with Noise space optimization and tensor manipulations to enable high-quality reconstruction and local editing. For example, we can perform image inpainting where these regularized latent spaces reconstruct the image's content, and the details of the missing regions are filled by the GAN prior. Next, we study if a 2D image-based GAN learns a meaningful semantic model and 3D properties in an image. Using our analysis, we can extract a plausible interpretation of 3D geometry, lighting, materials, and other semantic attributes of the source images by modeling the latent space using conditional continuous normalizing flows. As a result, we can perform non-linear sequential edits on the source image without affecting the quality and identity of the image. Furthermore, we propose a technique to extract underlying latent space properties using an unsupervised method to generalize our analysis on unseen datasets where human knowledge is limited. Specifically, we use an information-rich visual-linguistic model, CLIP, trained on internet scale data of image-text pairs. The proposed framework extracts, labels, and projects important directions into the GAN latent space without human supervision. Finally, inspired by the findings of our analysis, we investigate additional related unexplored questions: Can we perform foreground object segmentation? Can an image-based GAN be used to edit videos? Can we generate view-consistent editable 3D animations? Investigating these research questions helps us use GANs to tackle a spectrum of tasks outside the usual image generation task. Specifically, we propose a technique to segment foreground objects from the generated images using the information stored in the StyleGAN feature maps. This framework can be used to create synthetic datasets, which can be used to train existing supervised segmentation networks. Then, we study the regularized $W+$, activation $S$, and Fourier feature $F_f$ spaces to embed and edit videos in the image-based StyleGAN3, a variant of StyleGAN. We can generate high-quality videos at $1024^2$ resolution using a single image and driving videos. Finally, we propose a framework for domain adaptation in 3D-GANs that can link latent spaces of different models together. We build upon EG3D, a 3D-GAN derived from StyleGAN, to enable the generation, editing, and animation of personalized 3D avatars. Technically, we propose a method to align the camera distribution of two domains i.e., faces and avatars. Then we propose a method for domain adaptation in 3D-GANs using texture, geometric, and depth regularization with an option to model more exaggerated geometries. Finally, we propose a method to link and project real faces into the 3D artistic domain. These frameworks allow us to develop tools distilled from an unconditional GAN for unsupervised image segmentation, video editing, and personalized 3D animation generation and manipulation with state-of-the-art performance. We create these tools without needing extra annotated object segmentation, video, or 3D data.
    Citation
    Abdal, R. (2023). Extracting Semantic and Geometric Information in Images and Videos using GANs [KAUST Research Repository]. https://doi.org/10.25781/KAUST-X2Z79
    DOI
    10.25781/KAUST-X2Z79
    ae974a485f413a2113503eed53cd6c53
    10.25781/KAUST-X2Z79
    Scopus Count
    Collections
    PhD Dissertations; Computer Science Program; Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division

    entitlement

     
    DSpace software copyright © 2002-2023  DuraSpace
    Quick Guide | Contact Us | KAUST University Library
    Open Repository is a service hosted by 
    Atmire NV
     

    Export search results

    The export option will allow you to export the current search results of the entered query to a file. Different formats are available for download. To export the items, click on the button corresponding with the preferred download format.

    By default, clicking on the export buttons will result in a download of the allowed maximum amount of items. For anonymous users the allowed maximum amount is 50 search results.

    To select a subset of the search results, click "Selective Export" button and make a selection of the items you want to export. The amount of items that can be exported at once is similarly restricted as the full export.

    After making a selection, click one of the export format buttons. The amount of items that will be exported is indicated in the bubble next to export format.