Assignment 12: Introspection Part 1

Deadline: January 24th, 9am

In this assignment, you will implement gradient-based model analysis both for creating saliency maps (local) and for feature visualization (global). You can adapt your implementation of the adversarial examples of Assignment 10 and also take inspiration from the DeepDream tutorial. It is recommended that you work on image data as this makes visual inspection of the results simple and intuitive.

You are welcome to use pre-trained ImageNet models from the tf.keras.applications module. The tutorial linked above uses an Inception model, for example. You can also train your own models on CIFAR or something similar.

Gradient-based saliency map (sensitivity analysis)

Run a batch of inputs through the trained model. Wrap this in a GradientTape where you watch the input batch (batch size can be 1 if you’d like to just produce a single saliency map). and compute the gradient for a particular logit or its softmax output with respect to the input. This tells us how a change in each input pixel would affect the class output. This already gives you a batch of gradient-based saliency maps! Plot the saliency map next to the original image or superimpose it. Do the saliency maps seem to make sense? How would you interpret them?

Activation Maximization

Extend the code from the previous part to create an optimal input for a particular class.

Start with a random image, not one from the dataset (although you could also use a dataset image as a starting point). Multiply the gradients with a small constant (like a learning rate) and add them to the input. Repeat this multiple times, computing new gradients with respect to the input each time. Essentially, you are writing a “training loop” for producing an optimal input for a certain class (do not train the model weights!).
Note: You need to take care that the optimized inputs actually stay valid images throughout the process, e.g. by clipping to [0, 1] after each gradient step, or by using a sigmoid function to produce the images.

Does the resulting input look natural? How does the inputs change when applying many steps of optimization? How do the optimal inputs differ when initializing the optimization with random noise instead of real examples? Can you see differences between optimizing a logit or a softmax probability?

Bonus: Apply regularization strategies to make the optimal input more natural-looking. You can also optimize for hidden features of the network (instead of outputs) assuming you can “extract” them from the model you built. Distill has an article that can provide some inspiration.

Bonus Assignment: Introspection Part 2

This was the final assignment from last year, which “fell of the end” this year due to scheduling reasons. It is included here for people who want to do it. In this assignment, you will try to detect misbehavior of models and explain errors using Introspection methods.

Unmasking Clever Hans Predictors

We will start with a synthetic example, where we purposefully make it easy for the model to cheat. Afterwards, we will apply Introspection to detect that and explain how the model cheated.

Instead of only altering one class, you could also apply different kinds of cheating opportunities to different classes.

Note: If you add regularization to your model, this should not get rid of the information that we have inserted into the class. For example, if your “extra information” is that the images of one class are highly saturated, but you add preprocessing that normalizes image saturation, or perhaps applies random saturation to all images, this would destroy the extra information.

Contrastive Explanations

This is a more realistic scenario in which you can also try out more advanced methods to create saliency maps.
You will work with a pre-trained network and try to explain wrong decisions of the network with different Introspection techniques and contrastive explanations.

If you use tf-explain (or something similar), you can easily try out different saliency map methods and compare which one helps you most in explaining classification errors.

Bonus: An experiment: Explanation by input optimization

Let’s use our feature visualization technique from the last assignment in a different way.

Note from Andreas: I have no idea whether this approach works. But I would be very excited to see some experiments :)