Assignment 3: Keras & CNNs

Deadline: November 13th, 9am

In this assignment, you will get to know Keras, the high-level API in Tensorflow. You will also create a better model for the MNIST dataset using convolutional neural networks.

Keras

The low-level TF functions we used so far are nice to have full control over everything that is happening, but they are cumbersome to use when we just need “everyday” neural network functionality. For such cases, Tensorflow has integrated Keras to provide abstractions over many common workflows. Keras has tons of stuff in it; we will only look at some of it for this assignment and get to know more over the course of the semester. In particular:

Unfortunately, none of the TF tutorials are quite what we would like here, so you’ll have to mix-and-match a little bit:

Later, we will see how to wrap entire model definitions, training loops and evaluations in a hand-full of lines of code. For now, you might want to rewrite your MLP code with these Keras functions and make sure it still works as before.

An example notebook can be found here.

CNN for MNIST

You should have seen that (with Keras) modifying layer sizes, changing activation functions etc. is simple: You can generally change parts of the model without affecting the rest of the program (training loop etc). In fact, you can change the full pipeline from input to model output without having to change anything else (restrictions apply).

Replace your MNIST MLP by a CNN. The tutorials linked above might give you some ideas for architectures. Generally:

Note: Depending on your machine, training a CNN may take much longer than the MLPs we’ve seen so far. Here, using Colab’s GPU support could be useful.
Also, processing the full test set in one go for evaluation might be too much for your RAM. In that case, you could break up the test set into smaller chunks and average the results (easy using keras metrics) – or just make the model smaller.

You should consider using a better optimization algorithm than the basic SGD. One option is to use adaptive algorithms, the most popular of which is called Adam. Check out tf.optimizers.Adam. This will usually lead to much faster learning without manual tuning of the learning rate or other parameters. We will discuss advanced optimization strategies later in the class, but the basic idea behind Adam is that it automatically chooses/adapts a per-parameter learning rate as well as incorporating momentum. Using Adam, your CNN should beat your MLP after only a few hundred steps of training. The general consensus is that a well-tuned gradient descent with momentum and learning rate decay will outperform adaptive methods, but you will need to invest some time into finding a good parameter setting – we will look into these topics later.

If your CNN is set up well, you should reach extremely high accuracy results. This is arguably where MNIST stops being interesting. If you haven’t done so, consider working with Fashion-MNIST instead (see Assignment 1). This should present more of a challenge and make improvements due to hyperparameter tuning more obvious/meaningful. You could even try CIFAR10 or CIFAR100 as in one of the tutorials linked above. They have 32x32 3-channel color images with much more variation. These datasets are also available in tf.keras.datasets.
Note: For some reason, the CIFAR labels are organized somewhat differently – shaped (n, 1) instead of just (n,). You should do something like labels = labels.reshape((-1,)) or this will mess up the loss function.

What to Hand In