Assignment 4: Variational Autoencoders

Discussion: May 13th
Deadline: May 12th, 23:59

In this assignment, we will implement a variational autoencoder (VAE).

Implementing a VAE

From an implementation standpoint, a VAE is pretty much just an autoencoder with a stochastic encoder and a regularized latent space. As such, you might want to proceed as follows:

Build an autoencoder. Use any dataset/model of your choice.
Add stochasticity in the last encoder layer. With the common choice of a Gaussian distribution, this just means splitting the layer into two parts, one of which generates means, the other variances. Then you use these values to take Gaussian samples. You can use tf.random.normal for this – take samples from a standard normal distribution, multiply with the standard deviation and add the mean. Be careful with the layer that generates stds/variances; think about what value range these can be in and what value range your layer returns. That is, choose a sensible activation function!
Add a regularizer term to the reconstruction loss, corresponding to the KL-divergence. The exact form of this for the Gaussian case can be found in many available tutorials, implementations as well as the original paper.

You will likely find many VAE implementations around the web. Feel free to use these for “inspiration”, but make sure you understand what you are doing! In particular, here are some technicalities to pay special attention to:

Choose your reconstruction loss carefully. Recall the discussion from the first exercise (second part of Assignment 0). The loss needs to correspond to the negative log-likelihood of the data conditioned on the latents, which will depend on how you choose to parameterize it. If you just pick “some” loss, you might not have an actual variational autoencoder.
- One particularly devious issue comes up when using Keras losses like BinaryCrossentropy or MeanSquaredError. By default, this will compute a per-pixel loss and then average over all pixels. However, in this case we should sum over pixels since:
  - We make the assumption that pixels are independent.
  - With independence, the image probability becomes the product of pixel probabilities.
  - Since we use log probabilities, this becomes the sum of log probabilities.
- Normally it doesn’t matter much if we sum or average over pixels, since if the number of pixels is always the same, this is a constant factor. However in this case it matters a lot, since averaging over pixels would make the loss much smaller relative to the KL divergence term, and this will screw up the learning.
Regarding the above issue, depending on the details of the data, loss function etc. you might need to scale down the regularizer significantly (by multiplying with a number much smaller than 1) to achieve any learning at all. A typical sign of “overregularization” is when all reconstructions look the same (often like some kind of average of all images). On the other hand, scaling it up can also help to achieve better samples. Strictly speaking, adding a scaling parameter implements a beta-VAE.
In your KL loss (and throughout the rest of your program), pay special attention to where you need the variance, the log variance, standard deviation, log standard deviation etc. Depending on how you parameterize it with your model, you will need different kinds of conversions here.

Train your VAE and generate some samples, perhaps trying out multiple architectures and datasets. Think about the following issues:

In case you ran into the aforementioned problem and your VAE refused to reconstruct anything, forcing you to tone down the regularizer: Why do you think this happens? Even if you don’t run into this issue, think about why the VAE regularization might be a particularly troublesome one.
How can you check whether the regularization was “successful”? Try your method of choice on your own model(s).
Compare VAE reconstructions with those of a normal autoencoder. They will likely be significantly more blurry. Why does this happen? Aside from that, why does blurriness tend to already be an issue in “normal” AEs?