Assignment 13: Introspection Part 2

Deadline: January 23rd, 9am
This is the final assignment for this class.

In this assignment, you will try to detect misbehavior of models and explain errors using Introspection methods.

Unmasking Clever Hans Predictors

We will start with a synthetic example, where we purposefully make it easy for the model to cheat. Afterwards, we will apply Introspection to detect that and explain how the model cheated.

Instead of only altering one class, you could also apply different kinds of cheating opportunities to different classes.

Note: If you add regularization to your model, this should not get rid of the information that we have inserted into the class. For example, if your “extra information” is that the images of one class are highly saturated, but you add preprocessing that normalizes image saturation, or perhaps applies random saturation to all images, this would destroy the extra information.

Contrastive Explanations

This is a more realistic scenario in which you can also try out more advanced methods to create saliency maps.
You will work with a pre-trained network and try to explain wrong decisions of the network with different Introspection techniques and contrastive explanations.

If you use tf-explain (or something similar), you can easily try out different saliency map methods and compare which one helps you most in explaining classification errors.

Bonus: An experiment: Explanation by input optimization

Let’s use our feature visualization technique from the last assignment in a different way.

Note from Valerie: I have no idea whether this approach works. But I would be very excited to see some experiments :)