Discussion: June 27th
Deadline: June 26th, 20:00
For this assignment, you will need a working diffusion model implementation. Either
assignment09)The latter is recommended as it is self-contained and extendable.
The Gitlab code should work as-is! If it doesn’t, please ask as it is most likely not your fault! Test it before changing anything!
A conditional model is a model of p(x|c) instead of just p(x), where c is
some conditional information. A very straightforward kind is a class-conditional
model, where you can supply a class as “input” and get a generation of that class.
Let’s start by implementing one of those.
Embedding)
and then insert it into the hidden layers through operations such as adaptive normalization.Train a conditional model and make sure that it’s generating appropriate samples given class inputs!
Guided Diffusion has been shown to provide a quality-variety tradeoff that can
be tuned as desired. There are two kinds of guidance: Classifier guidance and
classifier-free guidance. The latter is more popular and easier to implement. The
score changes from gradient(log(p(x|c))) to (1+w) * gradient(log(p(x|c))) - w * gradient(log(p(x))).
Diffusion models don’t implement a score directly, but the common implementation is
equivalent. So in the Langevin sampler, just change model(x,c) to (1 + w) * model(x,c) - w * model(x).
w is a hyperparameter. Some special cases are:
w = -1 results in unconditional sampling.w = 0 results in a standard conditional model.w > 0 enables guidance. Normally w >= 1.We just have one problem: We trained a conditional model, but now we need an unconditional model as well. Luckily, we can get both at the same time.
A simple way do achieve this dropout could be like this (it’s basically a binary mask):
drop_prob = 0.1 # for example
# you may need to choose dtype=tf.float32 or dtype=tf.int32 depending on condiitoning
drop_dist = tfp.distributions.Bernoulli(probs=1-drop_prob, dtype=??)
conditioning = ... # e.g. class labels as one_hot or indices
drop_sample = drop_dist.sample(tf.shape(conditioning))
dropped_conditioning = drop_sample * conditioning
Otherwise, training doesn’t change from the normal conditional model.
Finally, as mentioned further above, adapt the Langevin sampler to provide both
conditional and unconditional model outputs (just run the model twice, once
with conditioning, once with no/dropped conditioning). Experiment with different values
of w. Common values for good samples are in the region of 1-5 (again depending on the dataset, model etc.). Do you find that
quality increases as opposed to w = 0 (no guidance)? What happens as you increase
w further, say 10 or above?