Question regarding Homework assignment 2.2, subtask 2

Disclaimer: Dieser Thread wurde aus dem alten Forum importiert. Daher werden eventuell nicht alle Formatierungen richtig angezeigt. Der ursprüngliche Thread beginnt im zweiten Post dieses Threads.

Question regarding Homework assignment 2.2, subtask 2
Hello,

regarding the above mentioned homework assignment and since I have lost a few points in this subtask, I have the following question:

The solution of subtask 2 says “We normalize to F= Age_25, making …” . Does this have anything to do with the Normalization we do in Bayesian Networks? The steps the solution does seem intuitive to me: We only look at Age_25 women now, so instead of talking about P(Down|F=Age25), we now simply say P(Down) (correct me if this is wrong). Now I’m not sure whether or not I’m missing something, because somehow I can’t see how this normalization is the same normalization we use in Bayesian Networks. They’re different concepts, right?

Thanks in advance and have a nice day!


You are on the right track: Whenever P(X) is a probability distribution, so is P(X|Y). Hence you can e.g. write P_Y(X) to denote that probability distribution. In the example, if P(Down) is a probability distribution, so is P_(F=Age-25)(Down).

This is indeed equivalent to the normalization technique (which among other things, is used in Bayesian networks). For the vector P_(F=Age-25)(Down)= alpha*[x,y] to be a probability distribution, its values have to sum to one. Note that (by definition) x=down /\ F=Age_25, y= not down /\ F=Age_25 and as this is only part of the joint probability distribution P(Down, F), the probabilities of these two events will not sum to 1. So to make a probability distribution we have to choose a suitable normalization constant alpha. This is done by setting alpha=1/(x+y)=1/P(F=Age_25) (the last equality is by marginalization).

In the homework most of the probabilities you were given were already conditioned on F=Age_25, so normalizing did not require any computation. However, there was one exception which strictly speaking one would have to normalize: the false positive/false negative rates P(pos|not down), P(not pos|down). It might be that these depend on the age of the tested person. However, most of you assumed that false positive/false negative rate and age are independent, hence P(pos|not down)=P(pos|not down, F=Age_25) allowing you to use the given values.


Thank you very much, that explanation has made me understand it!