Assignment 2: Let’s go to Monte Carlo
Discussion: April 25th
Deadline: April 24th, 20:00
NOTE THE CHANGED DEADLINE
In this assignment, we will be investigating Monte Carlo Methods with a few
simple examples. There is a lot of text for explanation, but the actual tasks
are rather compact.
Some starter code can be found on Gitlab!
Markov Chain Monte Carlo
To understand the basics of MCMC methods, we consider the simple example in
section 17.3 of the deep learning book, where we are interested in sampling
single integers x from 0, …, n according to some distribution. Try the
following:
- Define a transition matrix A as in the book. This should be a stochastic matrix,
i.e. each column must sum to 1. You can choose n as you wish; it’s okay to stick
with a small number (say, n=5). You can create the matrix from random values,
or write it down by hand. For theoretical reasons, it’s best to have no 0s in the matrix.
- Create an arbitrary initial distribution v(0), i.e. an n-element vector with elements
summing to 1.
- Repeatedly multiply the matrix A with the vector. The result should converge
very quickly (just a handful of steps) to a vector v’. This vector represents
the probability distribution that this Markov chain (represented by matrix A)
will converge to!
- Try different starting values of v(0). How does this influence v’ ?
Now, we run a Markov chain:
- Start off with an arbitrary state x(0).
- Repeatedly take the column from A that corresponds to the current state x(i)
(i.e. if the current state is 3, you take the 3rd column). Sample a new state
x(i+1) from this probability distribution. Do this many times (e.g. 1000s of times)
and collect all samples in a list or something similar.
- At the end, plot a histogram of your samples. This gives you an empirical
distribution over the samples. Does this distribution match the vector v’
computed above? If not, you might need to run the chain for more steps. If yes,
congratulations! By running your chain you are sampling from the distribution
represented by A.
- Note that you can use
tf.random.categorical for sampling; however this only
takes logits of a distribution. In this case, you might want to define A in
terms of logits in the first place, and apply softmax (per column!!) to get
a regular probability matrix. Alternatively, tensorflow-probability supports
categorical distributions based on probabilities as well as logits.
Gibbs Sampling & Mixing
In most of our use cases, the situation is not as in the example above: We don’t
have a transition distribution given and just start running a Markov chain on it.
Instead, we have a desired target distribution (our model distribution) and need to figure out how to
get there, i.e. how to sample from it.
Let’s try to sample from a mixture of Gaussians via Gibbs sampling.
- Set up a distribution. In order to be able to do any conditioning, this needs
to be multivariate. For simplicity (and easy plotting), stick to a 2D distribution.
Also, we want at least two components; these can be simple independent Gaussians.
- Start with an arbitrary initial sample (e.g. a vector of 0s). Now, repeatedly
do the following: Sample a new value for x given the value for y. Then, sample
a new value for y given the (new!) value for x. Since we now have sampled new
values for both dimensions, we have essentially taken a new sample of our 2D
distribution.
- You can use
tensorflow-probability (in particular, the distributions module)
to build the distributions. There are tutorials on
the Tensorflow website. You will
likely want to use MixtureSameFamily of Normal distributions.
- The hardest part is to figure out how to take a conditional sample. It turns
out that in this case, where p(x,y) is a mixture of Gaussians, the conditional
distribution p(x|y) (as well as the other way around) is a mixture of Gaussians
as well! You can find a derivation
here.
Being able to translate such formulae into code is important! By using independent
Gaussians as components, the only thing that actually changes in the conditional
distribution are the mixture coefficients!
- You can use
tfp to sample from the conditional, one-dimensional, mixtures of
Gaussians as mentioned above. You might say: Why not just sample from a two-dimensional
mixture of Gaussians in the first place? The reason we don’t do this is that
this is a contrived example to show the concept of Gibbs sampling. ;)
- Please note the file
assignment02_starter.ipynb uploaded to the course Gitlab –
this can help with some of the above issues.
You should collect a reasonable number of samples (1000 or more) and plot both the
target distribution (mixture of Gaussians) and your samples. Do the samples
reflect the distribution well? In particular are both modes of the Gaussian mixture
covered equally? You can do this visually and/or using statistics.
Also, experiment with different locations/scales for the Gaussians.
That is, move the components further apart or closer together and repeat the
sampling process each time. The quality of the samples should vary dramatically
based on the distance between components!