Discussion: November 14th
Deadline: November 13th, 18:00
This week kicks off our “marathon” of implementing various kinds of generative models. We start with Variational Autoencoders (VAEs), a classic model that has lost significance somewhat, but is still important in various contexts, and also forms the basis of other models we will see later.
These assignments will usually follow a pattern:
Of course, you can also do both. :)
For each assignment, we will give you significant help via the course repository:
lgm module.
This will take care of boilerplate as well as some of the more complex issues, or those easy to get wrong.
However, there will generally be details missing that you have to fill in.
This is usually about understanding the theory and being able to put it into practice.
These are crucial skills if you want to work on more recent state-of-the-art models, so you have to practice them!
This is why we heavily recommend against using generative AI or complete tutorials for help, as you will not build self-sufficiency this way.
Please also review the general notes in Assignment 0.
Note that the code in our repository is exempt from our usual plagiarism rules – it’s there for you to use!In general, you could ignore this help and do everything yourself. You will learn much more this way, but it will also be much more work. As such, make sure you have gotten access to the course repository as well as sufficient compute resources (we are offering GPU access as detailed on Mattermost).
For the submission you should generally submit your code (e.g. implementation of the missing parts from the repository) plus the usual notebooks showcasing your experiments.
In principle you should be able to create a new branch in the repository for yourself. If you do that, you could just note that in your submission and link to that code. Then you don’t have to upload it on Moodle. But please do not commit large notebooks with outputs to the repository! Notebooks should still be submitted via E-Learning!
As the name implies, VAEs are very similar to standard Autoencoders.
As such, the code for Assignment 0, as well as lgm/autoencoder.py should be helpful here.
Make sure you understand that code and try it out.
Then, we can consider how to implement VAEs just in terms of the difference to standard AEs.
There is also already a file lgm/vae.py with just a few pieces missing.
These are marked by throwing a NotImplementedError.
We further have a notebook 04_vae_starter.ipynb that sets up a basic VAE model, again with some parts missing.
The main changes to AEs you need to consider are as follows.
The “encoder” in a VAE actually represents the variational posterior q(z|x). This means we need a conditional distribution over z given an input x. Thus, we need to decide what form our distributions should take. The easiest choice by far is a Gaussian. Such a distribution is fully characterized by just two parameters, namely the mean and the variance. This means our encoder has to return two values for each code dimension, which we can interpret as the two parameters, respectively.
torch.split.torch.exp.NOTE: Recall the discussion in the exercise. We are technically free to choose whether our model returns variances, standard deviations, precisions, log-variances… There is no “correct” choice here. We just have to make sure to be consistent here and convert to whatever values we need. For example, the sampler (see below) requires the standard deviations. The KL-divergence requires variances and log-variances.
We need a method to sample values for z from the distribution q(z|x). Recall the “reparameterization trick” from the lecture. This module should be given means and standard deviations as input (as returned by the encoder), draw a random sample from a Standard Gaussian distribution of the same size as the input, multiply by the per-dimension standard deviations, and add the means. Thus we have sampled from q(z|x) with the correct parameters, while keeping the gradients with respect to those parameters (recall these are the encoder outputs) intact.
One reason to choose a Gaussian variational posterior, and a Gaussian prior as well, is that the KL-divergence between the two can be computed easily in closed form.
The formula has already been implemented in vae.py!
You just have to figure out how to get the correct values from the encoder to put into the function. :)
Implementing the above three points should give you a functioning VAE model.
Try training one and see if it works.
The code is already set up to regularly plot reconstructions as well as random generations.
Make sure that both aspects look reasonable!
It’s possible to have good reconstructions, but bad generations (the other way around is unlikely).
See the AE code for Assignment 0 on how to choose a reconstruction loss corresponding to a data likelihood.
You can go with "gaussian_fixed_sigma" as a reasonable starting point.
You will also have to implement the actual neural network architectures; again the basic autoencoder code can serve as inspiration here.
Finally, note that you also have to use some kind of reconstruction loss corresponding to the decoder likelihood. If you are confused about this topic, we wrote a blog post that might help.
After you have a basic model going, we recommend that you try it out further. Some ideas for experiments:
beta, usually larger than 1.
This is trivial to implement and has already been done in the code.
Try different values for beta and observe how the model changes (be sure to create a new model and train from scratch each time).
beta? Is it close to 1? Larger? Smaller? How do you even evaluate this?beta above!
Different likelihoods will have different optimal beta values.