Discussion: May 4th
In this assignment, we will be investigating Monte Carlo Methods with a few simple examples. There is a lot of text for explanation, but the actual tasks are rather compact.
To understand the basics of MCMC methods, we consider the simple example in section 17.3 of the deep learning book, where we are interested in sampling single integers x from 0, …, n according to some distribution. Try the following:
Now, we run a Markov chain:
tf.random.categorical
for sampling; however this only
takes logits of a distribution. In this case, you might want to define A in
terms of logits in the first place, and apply softmax (per column!!) to get
a regular probability matrix.In most of our use cases, the situation is not as in the example above: We don’t have a transition distribution given and just start running a Markov chain on it. Instead, we have a desired target distribution and need to figure out how to get there.
Let’s try to sample from a mixture of Gaussians via Gibbs sampling.
tensorflow-probability
(in particular, the distributions
module)
to build the distributions.You should collect a reasonable number of samples (1000 or more) and plot both the target distribution (mixture of Gaussians) as well as your samples. Do the samples reflect the distribution well? In particular are both modes of the Gaussian mixture covered equally? You can do this visually and/or using statistics. Also, experiment with different locations/scales for the Gaussians. That is, move the components further apart or closer together and repeat the sampling process each time. The quality of the samples should vary dramatically based on the distance between components!
Finally, let’s give importance sampling a shot. Say we are once again interested in sampling from a mixture of Gaussians as above, but cannot do so directly. Instead, we can only sample from a simple Gaussian distribution with mean 0 and standard deviation 1. Let’s try to estimate the average norm of a sample via importance sampling. In the language of section 17.2:
Take a bunch of samples from q and compute the Monte Carlo estimate (i.e. sample average) as in equation 17.10. You may want to plot how this evolves over time (i.e. with the number of samples). Does it converge? Next, also take samples from the mixture of Gaussians directly (the thing we pretended we couldn’t do) and compute the average norm on those samples. Compare with the importance sampling estimate. Do they converge to the same point? Does this depend on the locations/scales of the mixture components? Experiment with different values!