Assignment 7: The NICEst Assignment

Discussion: June 6th
Deadline: June 5th, 20:00

In this assignment, we want to implement some simple flow models on toy datasets, as well as attempt to fit simple “real” datasets like MNIST.

NOTE: On Gitlab, you can find assignment07_template.ipynb. This is a full NICE implementation with just some (key) steps missing. Thus, if you want, you can approach this by filling in the gaps (look for NotImplementedError) or SyntaxError caused by ...). This should alleviate the load in terms of code design etc. If you want, you can of course also do everything from the ground up yourself, or make other changes to the template as desired.

NICE

The OG flow model is the NICE model from this paper. It is also a relatively simple, making it a good candidate for first experiences with these models. Recall that in one of the readings from the lecture, code examples for how to implement flows in Tensorflow Probability are given. However,

But first, a note on terminology: In principle, it doesn’t matter which direction of the flow you call forward or backward, in which direction a function f is applied and in which its inverse, etc. However, it’s easy to get confused here because people use different conventions. I will strictly stick to the convention from the NICE paper, which is:

You might want to proceed as follows:

NICE Coupling layer

Implement a single NICE coupling layer. Note that this is not a single neural network layer (but inheriting from tf.keras.layers.Layer is still useful)!

NICE Full Model

The full NICE model simply stacks an arbitrary number of such coupling layers.

As a first sanity check, you should set up a very simple model on some toy data and check that the forward and backward functions are actually inverses of each other. That is, check that the difference between data and backward(forward(data)) (and/or the other way around) is near 0 (small differences due to numerical reasons are okay).

Training

That takes care of the model itself. Once this works, setting up training is very simple!

With all this taken care of, your model is ready to train. First, try it on simple toy data. See the notebook for a sample dataset (parabola). Training proceeds as usual, by gradient descent. You can use the negative log likelihood as a loss function and use standard TF optimizers. Feel free to try other toy datasets as well. Make sure you can successfully fit such datasets before moving on! If training fails, there are likely problems with your implementation.

Do not expect great results on datasets like MNIST. At the end of the day, this is not such a nice model. Haha.

Applications

Here are a few things you can do with your trained model.

Inpainting

See Section 5.2 of the NICE paper. You can fix a part of the input, and have the model generate the rest. This is also possible with e.g. autoregressive models, but they are limited due to their fixed generation order. With flows, any dimensions can be inpainted given any others.

The method simply uses gradient ascent to create an input that maximizes the likelihood, but only optimizing the “unknown” pixels (fixing the known ones).

Density Estimation/Outlier detection

A trained flow should be able to detect “atypical” inputs. A simple experiment could go like this:

  1. Define a “corruption process” to turn data points into “atypical” ones. For example:
    • Adding increasing amounts of random noise
    • Rotating images by increasing angles
    • Other transformations such as shearing, contrast, etc.
    • Slowly morphing/interpolating inputs into ones from a different dataset (e.g. MNIST -> FashionMNIST)
  2. Compute likelihoods, using your flow model, on the original data and on increasingly “corrupted” transformations of the data. Can you see a difference? Ideally, the corrupted data should receive a lower likelihood, and this should become more visible as you increase the degree of corruption.
  3. Can you find a threshold for likelihood below which data is likely to be an outlier? What kind of precision/recall can you achieve with such a simple method?