Discussion: October 24th
Deadline: October 23rd, 18:00
If you haven’t done so, please see the general assignment notes posted with Assignment 0.
This assignment builds on concepts discussed in the first exercise as well as the reading (Chapter 2 in the Bishop book), so if you struggle, try doing the reading first!
Consider a dangerous and/or common illness that people are being tested for to recognize it early (e.g. cancer) and/or prevent its spread (e.g. COVID). The test is either positive or negative. We make the following assumptions:
You take part in a study where a random, representative sample of the population is tested for the illness. Your test result is positive. What is the probability that you have the illness?
For the simulation, note that you can generate uniform random numbers in the range [0, 1] via functions such as
np.random.rand, and such random numbers will be smaller than another fixed number p with probability p.
For example, the chance of the uniform random number being smaller than p=0.9 is 90%.
Next (mathematical solution is sufficient, no need for more simulation):
As we can see, the second test is much more prone to errors than the first.
However, assume that the results of the second test are conditionally independent of the first.
That is, whether the second test makes an error does not depend on whether there is an error on the first test and
vice versa, given whether a person is sick or not.
Now, both of your tests come back positive.
Given this information, what is the probability that you are indeed sick?
The purpose of this part is for you to walk through a basic probabilistic modeling task yourself. This can be considered somewhat more advanced, so if you struggle with this, focus on the first part above.
In the exercise, we looked at how one can model a coin toss using a Bernoulli distribution, and using this model and given some data, decide whether a coin is fair or not.
Here we extend this to a slightly more complex scenario: Say I have a coin, and I flip it. If it lands tails up, I flip again. If it lands heads up, I stop. Once it lands, heads up, I record the number of times I saw tails before I got heads. This is one trial. Then I repeat this many more times to get more trials. You can find a numpy array with results uploaded on E-Learning. Given these results, do you think this is a fair coin?
You can proceed as follows:
p, this gives the probability of seeing a “success” (heads) after exactly k failures (tails).
k trials (i.e. k-1 failures and one success).
This is slightly different!
The data linked above is the number of failures, without the success!
If you mix this up, you will get slightly different results from the intended one (not a big deal).p.p.p iteratively.Either way, this should get you the p that best explains the data according to the maximum likelihood principle.
Is this close to 0.5?
You might want to review the derivation for the Bernoulli case in our blog post on this topic.