Assignment 5: Object Detection

Deadline: November 14th, 9am

Send your colab link and group member info to jens.johannsmeier@ovgu.de

While MNIST is a convenient test bed for deep learning models due to its simplicity, real world problems generally work on far more complex data and also have more complex requirements. One very popular application is object detection: Given an image, the model should determine both where in the image objects of interest are located and what these images are.

Because

we generally want to be able to detect multiple objects in a single image,
we need to localize the objects,
and the images are far larger and more complex than MNIST,

solving this task yourself from the ground up would be “a little bit” too much at this point. Luckily, there is a Tensorflow Object Detection API that implements this functionality already. We ask you to familiarize yourself with this API in this assignment.

There are articles on different topics linked on the github site, however those might be a bit overwhelming. Check out this medium article with a link to a Colab notebook. Open this notebook in “playground mode” (top left) to be able to run and modify it. Run the notebook (the first cell installs stuff on the virtual environment that you might need to reinstall if you come back at a later point) and observe the output. Try to understand each cell on its own. Here you can find another blog post on “frozen” graphs in case the third cell confuses you. Note that it seems as if the notebook might sometimes spontaneously combust, in which case you will have to run everything again. Sorry! You might get around this by inserting tf.reset_default_graph() in the beginning of step 3.

Next, we want to modify the example somewhat. At the bottom of the notebook you can find some pointers on how to upload and detect objects in your own images. Pick some images that you like (e.g. holiday pictures, or just download some from somewhere) and perform detection on those images instead. Furthermore, there are many pretrained models available to do object detection with. You can find these here. Use at least 3 other models to classify the above pictures. Comment on any differences in the outputs, such as quality and time taken. Try to make sure that the detection outputs are visible in the notebook so that we can have a look at them!

Aside from this, there are numerous other things you could try. For example, while most models are trained on the COCO dataset, there are also other datasets available. Try out some of the other models, potentially with inappropriate data (e.g. the KITTI dataset is mainly intended for self-driving cars…). Make sure that you are updating the label path accordingly!
Also, while the example model draws bounding boxes, there also models available that draw masks on the data. Use one such model and try drawing the masks. This blog post should be helpful here (in particular the linked Github code).
There are many other things to try linked on the API website (see above), e.g. you could try transfer learning (retraining an existing model for a different task) or even training your own model from scratch (although this will likely take too long – maybe try MNIST ;)).