Assignment 7: Attention-based Neural Machine Translation

Deadline: December 6th, 9am

In this task, you will implement a simple NMT with attention for a language pair of your choice.
We will follow the corresponding TF Tutorial on NMT.

Please do not just use the exemplary English-Spanish example to reduce temptation of simply copying the tutorial.
You can find data sets here. We recommend picking a language pair where you understand both languages (so if you do speak Spanish… feel free to use it ;)). This makes it easier (and more fun) for you to evaluate the results. However, keep in mind that some language pairs have a very large amount of examples, whereas some only have very few, which will impact the learning process and the quality of the trained models.

You may run into issues with the code in two places:

  1. The downloading of the data inside the notebook might not work (it crashes with a 403 Forbidden error). In that case, you can simply download & extract the data on your local machine and upload the .txt file to your drive, and then mount it and load the file as you’ve done before.
  2. The load_data function might crash. It expects each line to result in pairs of sentences, but there seems to be a third element which talks about attribution of the example. If this happens, you can use line.split('\t')[:-1] to exclude this in the function.

Recommendation: Start with a small number of training examples. Use one of the training examples to evaluate whether training worked properly. Only switch to the complete data set if you’re sure that your code works, because training is quite slow.

Tasks:

Hint: Take care to save your models or their weights (in the drive) so you do not lose your training progress if the runtime resets!

Compare the attention weight plots for some examples between the attention mechanisms.
We recommend to add ,vmax=1.0 when creating the plot in ax.matshow(attention, cmap='viridis') in the plot_attention function so the colors correspond to the same attention values in different plots.

Here are a few questions for you to check how well you understood the tutorial.
Please answer them (briefly) in your solution!

Hand in all of your code, i.e. the working tutorial code along with all changes/additions you made. Include outputs which document some of your experiments. Also remember to answer the questions above! Of course you can also write about other observations you made.