Assignment 6: Variational Autoencoders & Parzen Windows
Discussion: June 8th
In this assignment, we will implement a variational autoencoder (VAE). Furthermore,
we will have a look at Parzen windows, which are a popular (albeit flawed)
method of evaluating generative models.
Implementing a VAE
From an implementation standpoint, a VAE is pretty much just an autoencoder with
a stochastic encoder and a regularized latent space. As such, you might want to
proceed as follows:
- Build an autoencoder. Use any dataset/model of your choice.
- Add stochasticity in the last encoder layer. With the common choice of a
Gaussian distribution, this just means splitting the layer into two parts, one
of which generates means, the other variances. Then you use these values to take
Gaussian samples. You can use
tf.random.normal
for this – take samples from a standard normal distribution, multiply
with the standard deviation and add the mean. Be careful with the layer that
generates stds/variances; think about what value range these can be in and what
value range your layer returns. That is, choose a sensible activation function!
- Add a regularizer term to the reconstruction loss.
You will likely find many VAE implementations around the web. Feel free to use
these for “inspiration”, but make sure you understand what you are doing!
Depending on the details of the data, loss function etc.
you might need to scale down the regularizer significantly (by multiplying
with a number much smaller than 1) to achieve any learning at all. A typical
sign of “overregularization” is when all reconstructions look the same.
Train your VAE and generate some samples, perhaps trying out multiple
architectures and datasets. Think about the following issues:
- In case you ran into the aforementioned problem and your VAE refused to
reconstruct anything, forcing you to tone down the regularizer: Why do you think
this happens? Even if you don’t run into this issue, think about why the VAE
regularization might be a particularly troublesome one.
- How can you check whether the regularization was “successful”? Try your method
of choice on your own model(s).
- Compare VAE reconstructions with those of a normal autoencoder. They will
likely be significantly more blurry. Why does this happen? Aside from that, why
is blurriness already an issue in “normal” AEs?
Density Estimation Using Parzen Windows
Sometimes you might be wondering how to evaluate a generative model. Evaluating
whatever loss function you trained with might be a start, but it is often not
very informative. Also, this precludes comparison between different frameworks
(e.g. RBMs vs VAEs). Looking at samples is cumbersome and highly subjective.
An objective measure of the “goodness of fit” of the model would be nice. One
attempt at such a measure uses kernel density estimation, in particular the
method of Parzen Windows. Read
this short doc
for a simple explanation of the method. If you prefer a more detailed explanation,
check this one.
You can use Parzen windows to evaluate your models as follows:
- The quantity we want to estimate is the likelihood of the test set given the
model. The higher this is, the better we assume our model to be.
- Take an arbitrary number of samples from your model (a common number is 10000,
actually recommended is much more; start with less for debugging).
- Choose some kernel. A Gaussian kernel is the standard choice. You can start
with a uniform one (a “hypercube”) but you are likely to get rather useless
results, in particular using MNIST. Why do you think this is?
- For each element of the test set (of whichever dataset you are using), compute
the Parzen estimate of the probability, using the aforementioned samples to
provide the kernels. Since the data is very high-dimensional, you will likely
run into numerical issues already when trying to compute a single probability.
In this case, use log probabilities instead. You should try to implement this
yourself, but if you get stuck there might be something in
utils.math
in the
course repo to help you.
- The test set likelihood would be the product of all individual probabilities.
Since this will definitely cause numerical issues, it is customary to compute
the log-likelihood instead by summing over all the log probabilities.
- You will usually need to choose some width parameter for the kernel (e.g. variance
for the Gaussian kernel). This is usually chosen such that the log-likelihood is
maximized. If you want to do this properly, you should put aside a separate
validation set and choose the variance to maximize the likelihood on here, then
report the value on the test set with this variance. If you’re short on time,
just try to maximize it on the test set directly , but don’t tell anyone you did
this.
- Evaluate at least two different models and compare results. These could be
two VAEs, or an RBM and a VAE, or any other generative model. You could also
construct one of the models to perform badly on purpose and see whether the
Parzen estimate agrees.
- Have a look at this paper and
immediately forget all of this.