Assignment 8: Autoregressive Models

Discussion: June 10th Deadline: June 9th, 23:59

In this assignment, we will try to implement simple versions of autoregressive models like PixelCNN.

Autoregressive Modeling

It is possible to view an image as a sequence of pixel values, and set up a generative process accordingly. The inductive bias of this model is questionable, but the practical results are rather strong. In the course repository, you can find a notebook that applies this idea in a very naive fashion to the MNIST dataset. This model ignores the fact that images have rows and columns and thus tends to generate outputs in the wrong place. This also means that, for example, the pixel directly above another one is treated as farther away than the one directly to the left. Overall, it’s just not a good model! Instead, let’s use ideas from the PixelRNN paper, in particular PixelCNN.

PixelCNN is relatively simple to implement by using a masked convolutional layer, zeroing out certain components. We need to do this because the network cannot look into the future, i.e. it cannot make use of pixels that have not been generated yet (according to whatever generation order is being used). See figure 4 in the paper, or figure 1 in the follow-up. You can find sample code in the course repository. Use this (or your own implementation) to build a stack of convolutional layers, being mindful of the differences between type A and type B masks. The final layer should have 256 output units and acts as the softmax predictor just like in the RNN code. Training is straightforward – the target is equal to the input (but note that you may want to scale inputs to [0, 1] whereas targets should be category indices from 0 to 255)! Because the layers are masked such that the central pixel is not looked at, this works out just right – we are predicting the central pixel from those above and to the left of it.

With the model trained (which can take a while), you can use it for generation! This proceeds sequentially, one pixel after the other, and is very slow. There is sample code in the repo, but this is not optimized very well, so you may be able to improve on it. Hopefully, your results are better than with the RNN!

If you get bad results:

Also: Transferring autoregressive models to color datasets is a bit more complicated because of how the masks interact with the color channels. In the PixelRNN paper, every color value is generated one-by-one, which complicates the masks. However, it should also be possible to sample all color values of one pixel independently in parallel, although technically this weakens the model. The easiest thing is to just use grayscale datasets – you can also convert color datasets such as Flickr Faces to grayscale using tf.image.rgb_to_grayscale.