Assignment 7: GANs

Discussion: June 15th

Guess what, we’re doing GANs this time.

Basic Setup

Implementing the basic GAN logic isn’t too difficult. You will likely want to use low-level training loops (i.e. GradientTape) because of the non-trvial control flow. There are many examples around the web that can help you get started.

Define two models: One that maps from a noise space to data (generator) and one that maps from data to a single number (discriminator). Use whichever dataset you like!
Train the two models. The simplest setup is to alternate one step of generator training and one step of discriminator training.
To train D, get a batch of generated samples from G and a batch of real samples from the dataset and minimize D’s classification error. Be sure to only update the parameters of D!
To train G, generate a batch of samples (do not reuse the one generated above!) and maximize D’s classification error. Be sure to only update the parameters of G!

This should be enough for a functional training procedure. Train some models and generate samples for evaluation. They will most likely be terrible.

Take note: Evaluating whether GAN training is progressing/”working” is difficult. The loss values are not very informative. You will most likely want to take some samples and plot them every so often while training is progressing to get an impression of the current state. However, even this can be misleading: You might run into mode collapse problems early on, which you can take as evidence that training is not working, and stop the process early. However, it could actually happen that the mode collapse “magically” gets fixed over the course of a few training iterations, and diverse samples are produced. For this reason, consider always training for a large number of steps (larger than e.g. VAEs) and just see what happens.

Improving GAN training

GANs are notoriously difficult to train. In the rest of this assignment, you are asked to try out various ways to improve the basic procedure. There are countless advanced GAN variants, but for now you may focus on “tricks” to make the original formulation more stable. Here are some leads:

The original GAN paper proposes (at the end of section 3) to “flip” the generator loss, which should lead to better gradients early in training.
This follow-up discusses some techniques for improved training. Most of these are probably a bit much to implement, but you could try things like one-sided label smoothing, minibatch discrimination or making use of class labels (semi-supervised learning).
DCGAN is a reference architecture for how to implement GANs with CNNs and includes many “tricks” that seem to work well in practice. In particular, if using Adam as optimizer, have a look at their hyperparameters (very non-standard).
ganhacks is a repository of yet more tricks that might help.

Include as many of these methods as you want/need into your model and try to achieve some nice samples!