Stylistic and Spatial Disentanglement in GANs

dc.contributor.advisorWonka, Peter
dc.contributor.authorAlharbi, Yazeed
dc.contributor.committeememberMichels, Dominik
dc.contributor.committeememberGhanem, Bernard
dc.contributor.committeememberYang, Ming-Hsuan
dc.contributor.departmentComputer, Electrical and Mathematical Science and Engineering (CEMSE) Division
dc.date.accessioned2021-08-17T10:58:08Z
dc.date.available2021-08-17T10:58:08Z
dc.date.issued2021-08-17
dc.description.abstractThis dissertation tackles the problem of entanglement in Generative Adversarial Networks (GANs). The key insight is that disentanglement in GANs can be improved by differentiating between the content, and the operations performed on that content. For example, the identity of a generated face can be thought of as the content, while the lighting conditions can be thought of as the operations. We examine disentanglement in several kinds of deep networks. We examine image-to-image translation GANs, unconditional GANs, and sketch extraction networks. The task in image-to-image translation GANs is to translate images from one domain to another. It is immediately clear that disentanglement is necessary in this case. The network must maintain the core contents of the image while changing the stylistic appearance to match the target domain. We propose latent filter scaling to achieve multimodality and disentanglement. Previous methods require complicated network architectures to enforce that disentanglement. Our approach, on the other hand, maintains the traditional GAN loss with a minor change in architecture. Unlike image-to-image GANs, unconditional GANs are generally entangled. Unconditional GANs offer one method of changing the generated output which is changing the input noise code. Therefore, it is very difficult to resample only some parts of the generated images. We propose structured noise injection to achieve disentanglement in unconditional GANs. We propose using two input codes: one to specify spatially-variable details, and one to specify spatially-invariable details. In addition to the ability to change content and style independently, it also allows users to change the content only at certain locations. Combining our previous findings, we improve the performance of sketch-to-image translation networks. A crucial problem is how to correct input sketches before feeding them to the generator. By extracting sketches in an unsupervised way only from the spatially-variable branch of the image, we are able to produce sketches that show the content in many different styles. Those sketches can serve as a dataset to train a sketch-to-image translation GAN.
dc.identifier.citationAlharbi, Y. (2021). Stylistic and Spatial Disentanglement in GANs. KAUST Research Repository. https://doi.org/10.25781/KAUST-4R0H7
dc.identifier.doi10.25781/KAUST-4R0H7
dc.identifier.urihttp://hdl.handle.net/10754/670641
dc.language.isoen
dc.person.id133407
dc.subjectGAN
dc.subjectsketch
dc.subjectstyle
dc.subjecttranslation
dc.subjectImage synthesis
dc.subjectDisentanglement
dc.titleStylistic and Spatial Disentanglement in GANs
dc.typeDissertation
display.details.left<span><h5>Type</h5>Dissertation<br><br><h5>Authors</h5><a href="https://repository.kaust.edu.sa/search?query=orcid.id:0000-0002-8073-1959&spc.sf=dc.date.issued&spc.sd=DESC">Alharbi, Yazeed</a> <a href="https://orcid.org/0000-0002-8073-1959" target="_blank"><img src="https://repository.kaust.edu.sa/server/api/core/bitstreams/82a625b4-ed4b-40c8-865a-d6a5225a26a4/content" width="16" height="16"/></a><br><br><h5>Advisors</h5><a href="https://repository.kaust.edu.sa/search?query=orcid.id:0000-0003-0627-9746&spc.sf=dc.date.issued&spc.sd=DESC">Wonka, Peter</a> <a href="https://orcid.org/0000-0003-0627-9746" target="_blank"><img src="https://repository.kaust.edu.sa/server/api/core/bitstreams/82a625b4-ed4b-40c8-865a-d6a5225a26a4/content" width="16" height="16"/></a><br><br><h5>Committee Members</h5>Michels, Dominik<br><a href="https://repository.kaust.edu.sa/search?query=orcid.id:0000-0002-5534-587X&spc.sf=dc.date.issued&spc.sd=DESC">Ghanem, Bernard</a> <a href="https://orcid.org/0000-0002-5534-587X" target="_blank"><img src="https://repository.kaust.edu.sa/server/api/core/bitstreams/82a625b4-ed4b-40c8-865a-d6a5225a26a4/content" width="16" height="16"/></a><br>Yang, Ming-Hsuan<br><br><h5>Program</h5><a href="https://repository.kaust.edu.sa/search?spc.sf=dc.date.issued&spc.sd=DESC&f.program=Computer Science,equals">Computer Science</a><br><br><h5>KAUST Department</h5><a href="https://repository.kaust.edu.sa/search?spc.sf=dc.date.issued&spc.sd=DESC&f.department=Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division,equals">Computer, Electrical and Mathematical Science and Engineering (CEMSE) Division</a><br><br><h5>Date</h5>2021-08-17</span>
display.details.right<span><h5>Abstract</h5>This dissertation tackles the problem of entanglement in Generative Adversarial Networks (GANs). The key insight is that disentanglement in GANs can be improved by differentiating between the content, and the operations performed on that content. For example, the identity of a generated face can be thought of as the content, while the lighting conditions can be thought of as the operations. We examine disentanglement in several kinds of deep networks. We examine image-to-image translation GANs, unconditional GANs, and sketch extraction networks. The task in image-to-image translation GANs is to translate images from one domain to another. It is immediately clear that disentanglement is necessary in this case. The network must maintain the core contents of the image while changing the stylistic appearance to match the target domain. We propose latent filter scaling to achieve multimodality and disentanglement. Previous methods require complicated network architectures to enforce that disentanglement. Our approach, on the other hand, maintains the traditional GAN loss with a minor change in architecture. Unlike image-to-image GANs, unconditional GANs are generally entangled. Unconditional GANs offer one method of changing the generated output which is changing the input noise code. Therefore, it is very difficult to resample only some parts of the generated images. We propose structured noise injection to achieve disentanglement in unconditional GANs. We propose using two input codes: one to specify spatially-variable details, and one to specify spatially-invariable details. In addition to the ability to change content and style independently, it also allows users to change the content only at certain locations. Combining our previous findings, we improve the performance of sketch-to-image translation networks. A crucial problem is how to correct input sketches before feeding them to the generator. By extracting sketches in an unsupervised way only from the spatially-variable branch of the image, we are able to produce sketches that show the content in many different styles. Those sketches can serve as a dataset to train a sketch-to-image translation GAN.<br><br><h5>Citation</h5>Alharbi, Y. (2021). Stylistic and Spatial Disentanglement in GANs. KAUST Research Repository. https://doi.org/10.25781/KAUST-4R0H7<br><br><h5>DOI</h5><a href="https://doi.org/10.25781/KAUST-4R0H7">10.25781/KAUST-4R0H7</a></span>
kaust.request.doiyes
orcid.id0000-0002-5534-587X
orcid.id0000-0003-0627-9746
orcid.id0000-0002-8073-1959
refterms.dateFOA2021-08-17T10:58:09Z
thesis.degree.disciplineComputer Science
thesis.degree.grantorKing Abdullah University of Science and Technology
thesis.degree.nameDoctor of Philosophy
Files
Original bundle
Now showing 1 - 3 of 3
Name:
Yazeed Final Approval of Thesis - signed 16 Aug 2021 (1).pdf
Size:
477.64 KB
Format:
Adobe Portable Document Format
Description:
Name:
SignedCopyrightForm.pdf
Size:
540.65 KB
Format:
Adobe Portable Document Format
Description:
Loading...
Thumbnail Image
Name:
FinalThesisYazeedAlharbi.pdf
Size:
40.83 MB
Format:
Adobe Portable Document Format
Description: