Deadline: April 29th, 20:00
Visualizing the learning progress as well as the behavior of a deep model is extremely useful for troubleshooting in case of unexpected outcomes, or just bad results. In this assignment, you will get to know TensorBoard, a visualization suite originally developed for Tensorflow that has since been integrated into Pytorch as well. You will also use it to diagnose some common problems with training deep models.
There is a tutorial on the Pytorch website as well as a complete API documentation for all the functionalities. There is also an example notebook on E-Learning showing a few more use cases (e.g. tracking gradient norms). Finally, there is a readme on Github that is more concerned with the actual app itself. The basic steps are usually:
SummaryWriter for the log directory of choiceNote that you have to install Tensorboard separately – it does not come with Pytorch! On Colab, it should be installed already.
Download the “deep learning fails” from E-Learning. This .zip archive contains several attempts at training MLPs on MNIST. While they should all run without errors, they should fail to lead to satisfactory model performance (>90%). For each example, find out why this is, and try to propose fixes for the respective issues. Use Tensorboard and/or your own visualizations to help!
Please don’t mess with the parameters of the network or learning algorithm before “experiencing” the original. You can of course use any oddities you notice as clues as to what might be going wrong. In fact, you might be able to completely diagnose the issue just by looking at differences in the code, without visualizations! But please try to use visualizations, as that’s the point of this exercise. At the very least, you can make hypotheses based on the code, and then confirm them experimentally via visualization. Here are some tips:
nan values appearing.
In this case, you should remove the histograms and use other means to find out what is going wrong.torch.linalg.norm(g);
feel free to add scalar summaries of these values to TensorBoard.
You can pass a tag to the variables when defining them and use this to give descriptive names to your summaries.add_images helps here. Note that
there is also add_image, with image singular – these are different!For the diagnosis/solution proposal, it is sufficient to write some text (Markdown cells). When handing in your fixes, you do not need to also submit the original failure, since we have access to that anyway. You can submit multiple notebooks (e.g. one per fail) on E-Learning to keep things more neatly separated.
If you want, you can use your newfound visualization powers to further investigate hyperparameter choices.
For example, how do the network gradients differ between saturating activations like tanh vs non-saturating ones like
relu?