Discussion: December 19th
Deadline: December 18th, 20:00
This week, we will get to know Flow-based generative models. These are a bit special in that they do not really provide state-of-the-art models, yet other model types based on the idea of flows are currently among the top performing models. As such, you should likely scale back your expectations compared to autoregressive models. On the other hand, trainings tends to be faster and is more straightforward, as we only have to train a single model, rather than the two-step process of VQVAE -> Autoregressor.
Unfortunately, the relatively simple “classic” Flow models really don’t perform well.
On the other hand, more complex extensions tend to very involved in terms of mathematics and implementation.
As such, we will have to settle for a compromise and just explore that a bit.
The code can be found in the repository as usual, in lgm.flow and associated notebooks.
The OG deep Flow model is NICE. We include this in the repository for reference, but this model does not perform on par with pretty much any other framework we have seen so far in the class. Still, it is comparetively simple and can be implemented without too many issues, so it serves as a good benchmark for your understanding of the general flow paradigm.
As such, your first task is to implement the missing pieces of the model. This consists of two parts:
This allows us to transform inputs from a complex distribution (i.e. the data) into a simple prior, like a standard Gaussian. Training is then simple: We minimize the negative log-likelihood, which is easy to compute thanks to the flow framework. Some relevant sections in the paper include:
We include a notebook with some simple 2D toy data you can use to test your implementation. A correct setup should be able to fit this easily. You can also move on to “real” datasets, but don’t expect good results. The paper shows some samples: Even MNIST doesn’t work too well, and anything beyond that is pretty much hopeless.
To let you experience a Flow model that works somewhat better, we also implement the Glow architecture. This is essentially an extension of RealNVP, which in turn is an improvement over NICE. The main differences to NICE are:
In terms of implementation, this is likely the most difficult model we will consider in this class. As such, this has been fully implemented already, allowing you to focus on experimentation. You should try to simply get a strong generative model on a dataset of your choice. Some issues to tackle include:
scale_fn to something besides the classic exp.
For example, the official OpenAI code has actually swapped this out for sigmoid, directly contradicting the paper…You can be bold here; if many people try many different things, we might end up with a pretty decent model at the end. :) We have been able to achieve FashionMNIST FIDs below 30, although inconsistently, which would be comparable to conditional VAEs from earlier in the class. Or you can read the Glow paper and try to stick als close as possible to all their choices. Up to you!