Deadline: November 14th, 9am
Send your colab link and group member info to jens.johannsmeier@ovgu.de
While MNIST is a convenient test bed for deep learning models due to its simplicity, real world problems generally work on far more complex data and also have more complex requirements. One very popular application is object detection: Given an image, the model should determine both where in the image objects of interest are located and what these images are.
Because
solving this task yourself from the ground up would be “a little bit” too much at this point. Luckily, there is a Tensorflow Object Detection API that implements this functionality already. We ask you to familiarize yourself with this API in this assignment.
There are articles on different topics linked on the github site, however those
might be a bit overwhelming. Check out
this medium article
with a link to a Colab notebook. Open this notebook in “playground mode” (top
left) to be able to run and modify it. Run the notebook (the first cell installs
stuff on the virtual environment that you might need to reinstall if you come
back at a later point) and observe the output. Try to understand each cell on
its own. Here
you can find another blog post on “frozen” graphs in case the third cell
confuses you. Note that it seems as if the notebook might sometimes spontaneously
combust, in which case you will have to run everything again. Sorry! You might
get around this by inserting tf.reset_default_graph() in the beginning of step 3.
Next, we want to modify the example somewhat. At the bottom of the notebook you can find some pointers on how to upload and detect objects in your own images. Pick some images that you like (e.g. holiday pictures, or just download some from somewhere) and perform detection on those images instead. Furthermore, there are many pretrained models available to do object detection with. You can find these here. Use at least 3 other models to classify the above pictures. Comment on any differences in the outputs, such as quality and time taken. Try to make sure that the detection outputs are visible in the notebook so that we can have a look at them!
Aside from this, there are numerous other things you could try. For example,
while most models are trained on the COCO dataset, there are also other datasets
available. Try out some of the other models, potentially with inappropriate data
(e.g. the KITTI dataset is mainly intended for self-driving cars…). Make sure
that you are updating the label path accordingly!
Also, while the example model draws bounding boxes, there also models
available that draw masks on the data. Use one such model and try drawing the
masks. This blog post
should be helpful here (in particular the linked Github code).
There are many other things to try linked on the API website
(see above), e.g. you could try transfer learning (retraining an existing model
for a different task) or even training your own model from scratch (although
this will likely take too long – maybe try MNIST ;)).