Discussion: May 16th
Deadline: May 15th, 20:00
In this assignment, we will implement (and experiment with) a variational autoencoder (VAE).
The choice is left to you whether you want to focus on the implementation, or
experimenting with the model (or both – you have two weeks!). Implementing such models yourself, i.e. putting theory into
practice, is a great exercise. We would recommend at least trying to do this and/or
spend some time thinking about how the different parts could work, before falling
back on ready-made implementations.
However, if you want to rather spend time on
working with VAEs rather than implementation details, there is an abundance of
VAE code on the web, for example on the Tensorflow website!
Just make sure that, if you are using other people’s code, clearly mark this in
your submission! Otherwise, you are plagiarizing!
The rest of this is split into two parts – first some notes on implementing VAEs yourself, then some ideas for experimentation.
From an implementation standpoint, a VAE is pretty much just an autoencoder with a stochastic encoder and a regularized latent space. As such, you might want to proceed as follows:
tf.random.normal
for this – take samples from a standard normal distribution, multiply
with the standard deviation and add the mean (this implements the “reparameterization trick”).
Be careful with the layer that
generates standard deviations/variances; think about what value range these can be in and what
value range your layer returns. That is, choose a sensible activation function!
tf.exp as a choice here. This is appropriate as it always
returns values > 0. However, it is often unstable (values/gradients tend to explode).
If you are struggling with nan losses, try the following:
tf.nn.softplus. This does not explode like exp.
However, it seems to lead to worse results empirically.exp(0) = 1. Empirically, this seems to prevent nan
issues due to unstable gradients.If you want, you can use the autoencoder code from Assignment 0 (uploaded to gitlab) as a base to start with. Also, you will likely find many VAE implementations around the web. Feel free to use these for “inspiration”, but make sure you understand what you are doing! In particular, here are some technicalities to pay special attention to:
BinaryCrossentropy
or MeanSquaredError.
By default, this will compute a per-pixel loss and then average over all pixels.
However, in this case we should sum over pixels since:
Train your VAE and generate some samples, perhaps trying out multiple architectures and datasets. As usual, you can try any experiments that interest you. Here is one proposal:
In the beta-VAE, the KL-term is
multiplied with a hand-picked hyperparameter beta, where usually beta > 1.
Implementing this model on top of your VAE should be trivial. Now, run several
trials with the same dataset/architecture, but varying beta
(you have to train a new model each time). You can both let
beta go to 0, as well as increase it to larger numbers.
Some aspects you could investigate:
beta? What about sample quality?beta where you get the best samples?
Is this close to 1, smaller, bigger?
How large is this zone, i.e. how sensitive is performance to small changes in
beta?Alternatively, you could also try to introspect the latent space. For example:
All of the above may also interact with the beta term. According to theory,
higher beta should lead to better disentanglement in the latent space, and thus
more interpretable dimensions in the code.