Reading Assignment 7: Autoregressive Models & LLMs

Autoregressive Models

Chapter 22 from Probabilistic Machine Learning: Advanced Methods gives a good overview of autoregressive models. Since these models are conceptually very straightforward, there is not too much deep theory here.

(The Bishop Book only briefly covers the topic in Section 12.2.4.)

Optional Reading: Classic Success Stories

Consider these optional reading in case you want to get some more concrete examples. Some of these may be treated in more detail later in the class.

PixelRNN models images directly on the pixel level. As the CNN variant was used a lot more, you can skip the details of the RNNs. There is also a follow-up, and another one.
Wavenet was a revolutionary model that generated audio in an autoregressive manner. It’s basically PixelCNN in 1D.
VQ-VAE encodes data in a lower-dimensional space and then uses an autoregressive model on that space. There is also VQ-VAE2 which uses a multi-level approach to generate very high-quality images.

Large Language Models

There are many language models being developed by large companies and research groups. Most of these function similarly. We will look at a select few only. As before, this is a lot to read, so the optional papers below are mainly included as a reference.

Section 12.3.5 of the Bishop Book provides a high-level overview.

For more details, refer to this HUGE overview paper with A Survey of Large Language Models.

Finally, a very important piece of research, providing justification for this research agenda, is Scaling Laws for Neural Language Models.

Optional: Model Examples

GPT series of papers:

As these are all developed by OpenAI, you could also check PaLM by Google.

Reinforcement Learning with Human Feedback

An important technique in fine-tuning LLMs, especially for human interaction.