Assignment 10: Adversarial Examples & Training

Deadline: January 10th, 9am

In this assignment, we will explore the phenomenon of adversarial examples and how to defend against them. This is an active research field and somewhat poorly understood, but we will look at some basic examples.

Creating Adversarial Examples

Train a of your choice model on “your favorite dataset”. It should be a classification task (e.g. MNIST… although a more complex dataset such as CIFAR should work better!). Next, let’s create some adversarial examples:

Hopefully, you are able to “break” your models somewhat reliably, but you don’t have to expect 100% success rate with your attacks.

Adversarial Training

Depending on your viewpoint, adversarial examples are either really amazing or really depressing (or both). Either way, it would be desirable if our models weren’t quite as susceptible to them as they are right now. One such “defense method” is called adversarial training – explicitly train your models to classify adversarial examples correctly. The procedure is quite simple: