General

Reading Assignment: Attention and Memory

Overview

For an overview on the topic, start with the blog post “Attention and Memory in Deep Learning and NLP” by Denny Britz and the distill.pub article “Attention and Augmented Recurrent Neural Networks” by Olah & Carter.

Main Reading

Continue by reading the following articles that provide details on two very recently proposed approaches,

Graves et al. “Hybrid computing using a neural network with dynamic external memory” in Nature 538.7626 (2016): pp. 471-476.
The blogpost The Illustrated Transformer by Jay Alammar.
Dehghani et al. “Universal Transformers” - This follow-up of the original transformer paper (see further reading below) is much more accessible and also incorporates the idea of adaptive compute time.

Further Reading (Optional)

Follow the links from the blog post to the referenced articles for further reading.
The original transformer paper Ashish et al. “Attention Is All You Need” in NIPS (2017) (The version on arxiv seems to be most recent.)