Assignment 11: What’s Wrong With Our Data?

Deadline: January 17th, 9am

Following this link, you will find a selection of CIFAR10 datasets. The archive contains triplets of training, validation and test sets. For each, train a model on the training set, making sure it “works” by checking its performance on the validation set. When you’re satisfied (feel free to fine-tune your models to achieve as good of a validation set performance as possible), check performance on the test set. Does it still work? That is, is the performance on the test set close to that on the validation set? You should find that it is usually either significantly worse (which makes us sad) or too good (which is suspicious). For each triplet, find out (e.g. through inspection or computing statistics of the dataset) what’s going wrong. Typical things to watch out for include:

If your unfamiliar with .npz files, see here for reference. Basically, you can use np.load to load the file, then use list(<objectname>.keys()) to check the available fields, and get them out of the object the same way as with a dictionary.


Next, each of the problems in the data sets is artificially constructed and somewhat exaggerated. Think about which possible real-world problems of data sets each example represents and provide examples.

Bonus: Practical Methodology

If you have spare time, you can try to put some of the insights you’ve gained from the lecture on “practical methodology” into practice. You can use a dataset such as CIFAR as a testing ground for your models. Some possibilities:

See whether, by following somewhat principled procedures, you have an easier time building high-performing models compared to “just trying stuff”. It’s always a good idea to document your stepwise changes when improving your model in such a structured way. This allows you to easily revert changes. Also, when coming back to the code after some time, you would otherwise have a hard time remembering which changes were successful.