lbwei space

More on Conditioning

#Conditional Probability
2022/10/15

Table of Content


Conditioning

Continuous Stuff

\[f_{X \vert Y}(x \vert y) = {f_{X,Y}(x,y) \over f_Y(y)}, \\ P(X\in A \vert Y=y) = \int_A f_{X\vert Y}(x \vert y)dx.\]

Coditional CDF

\[F_{X|Y}(x|y) = P(X\le x|Y=y) = \int_{-\infty}^x f_{X|Y}(t|y)dt, \\ f_{X|Y}(x|y) = {\partial F_{X|Y}(x|y) \over \partial x}.\]

注意是 under \(Y=y\)!

但是,在 continous r.v. 中,\(P(Y=y)\) 不是應該要等於 \(0\) 嗎?的確,\(P(Y=y) = 0\),不過這裡我們計算的是 r.v. \(X\),也就是說,我們關注的是在 \(Y=y\) 的情況下,\(\forall X \le x\) 的機率!也因為這個考量,\(F_{X\vert Y}(x\vert y)\) 才不寫成 \(P(X\le x, Y=y)/P(Y=y)\)。


Bayes’ Rule revisted

Bayes’ Rule

Given a discrete r.v. \(K\) and a continous r.v. \(Y\), the joint probability is

\[\begin{align*} &P(K=k, y \le Y \le+\delta) \\ =\ &P(K=k)P(y \le Y \le y+\delta|K=k) \approx p_K(k)f_{Y|K}(y|k)\delta \\ =\ &P(y\le Y \le y+\delta)P(K=k|y\le Y \le y+\delta) \approx f_Y(y)\delta p_{K|Y}(k|y). \end{align*}\]

Comparing the right hand side, we have

\[p_K(k)f_{Y|K}(y|k) = f_Y(y)p_{K|Y}(k|y).\]

From this equation, we can derive two formulae

\[\begin{align*} p_{K|Y}(k|y) = {p_K(k)f_{Y|K}(y|k) \over f_Y(y)}, \tag{1} \\ f_{Y|K}(y|k) = {f_Y(y)p_{K|Y}(k|y) \over p_K(k)}, \tag{2} \end{align*}\]

where

\[f_Y(y) = \sum_{k'} p_K(k')f_{Y|K}(y|k'), \\ p_K(k) = \int f_Y(y')p_{K|Y}(k|y')dy'.\]

可以發現,這 sum 和 integral 分別對應到 \((1)\) 和 \((2)\) 的分子!

解 \((1)、(2)\) 的時候,先算出分子再解決分母;一項一項慢慢來。

這裡,\(p_K(k)\) 稱為 prior probability(事前機率),指在沒有任何額外資訊下,the belief on \(K\);而 \(p_{K\vert Y}(k\vert y)\) 稱為 posterior probability(事後機率),則是多了 \(Y=y\) 的資訊。

後來發生的事件 \(Y=y\) 改變了我們對 \(K\) 的看法。


Conditional Expectation and Variance revisited

Conditioning

Conditional expectation as a r.v.

Define:

\[E[X\vert Y] = g(Y).\]

是 \(Y\) 的 function。

Law of iterated expectations

\(E\big[E[X\vert Y]\big] = E[X]\).

\[\begin{align*} E\big[E[X\vert Y]\big] &= E[g(Y)] \\ &= \sum_y g(y)p_Y(y) \\ &= \sum_y E[X\vert Y=y]p_Y(y) \\ &= E[X], \text{ by total expectation thm.} \end{align*}\]

在使用時,先將 \(Y\) 代入象徵性的 \(y\),即 \(Y=y\),算好後再把 \(y\) 替換成 \(Y\)。

Conditional variance as a r.v.

Law of total variance

\[var(X) = E\Big[var(X\vert Y)\Big] + var\Big(E[X\vert Y]\Big).\]

Proof

\[\begin{align*} var(X\vert Y=y) &= E[X^2\vert Y=y] - \Big(E[X\vert Y=y] \Big)^2,\text{ for all } y. \\ var(X\vert Y) &= E[X^2\vert Y] - \Big(E[X\vert Y] \Big)^2 \\ E\Big[var(X\vert Y)\Big] &= E\Big[E[X^2\vert Y]\Big] - E\bigg[\Big(E[X\vert Y] \Big)^2\bigg] \\ &= E[X^2] - E\bigg[\Big(E[X\vert Y] \Big)^2\bigg] \\ var\Big(E[X\vert Y]\Big) &= E\bigg[\Big(E[X\vert Y] \Big)^2\bigg] - \bigg(E\Big[E[X\vert Y] \Big] \bigg)^2 \\ &= E\bigg[\Big(E[X\vert Y] \Big)^2\bigg] - (E[X])^2. \end{align*}\]

Hence

\[E\Big[var(X\vert Y)\Big] + var\Big(E[X\vert Y]\Big) = E[X^2] - (E[X])^2 = var(X). \tag*{$\blacksquare$}\]

\(var(X) =\) (average variability within sections) \(+\) (variability between sections)

\(var(X) =\)(每個區間 var. 的平均)\(+\)(區間平均的 var.)

區間指的是每個 \(Y=y\)。


Problems

Rat in a maze

A rat in a maze has only two paths to go. If the rat goes right, it will return to the starting point after \(3\) minutes. If the rat goes left, there is a \(2\over 3\) chance of returning to the starting point after \(5\) minutes, and a \(1\over 3\) chance of exiting the maze after \(2\) minutes. Find the average time that the rat is in the maze if the probability of the rat going left and right is equal.

Solution

Let \(T\) be the time that the rat is in the maze and

\[X = \begin{cases} 0, &\text{goes right}, \\ 1, &\text{goes left}. \end{cases}\]

Then

\[P(X=0) = P(X=1) = {1\over 2}, \\ E(T|X=0) = 3 + E(T), \\ E(T|X=1) = {2\over 3}(5 + E(T)) + {1\over 3}\cdot 2.\]

Thus we have

\[E[E(T|X)] = {1\over 2}(3+E(T)) + {1\over 2}({2\over 3}(5 + E(T)) + {1\over 3}\cdot 2) = E(T), \\ \therefore E(T) = 21.\]

Professor and Student

Introduction to Probability, 2nd, by Dimitri P. Bertsekas and John N. Tsitsiklis. Problem 4.24 (p.251, 252).

Solution

Let \(T\) be the amount of time the professor will spend with the student, and \(F\) be the event that the student finds the professor. Then

\[E[T] = P(F)E[T\vert F] + P(F^c)E[T\vert F^c]. \tag{1}\]

We then need to find out \(P(F)\). Let

\[\begin{align*} W &= \text{ length of time between 9 a.m. and arrival of the Ph.D. student (uniformly distributed);} \\ X &= \text{ amount of time the professor devotes to his task (exponentially distributed);} \\ Y &= \text{ length of time between 9 a.m. and arrival of the professor(uniformly distributed).} \end{align*}\]

We have

\[P(F) = P(Y\le W\le X+Y).\]

We konw that \(W\) can be between \(0\) and \(8\), but \(X+Y\) can be arbitrarily large. That is to say, we may overestimate \(P(W\le X+Y)\). Hence we should write

\[P(Y\le W\le X+Y) = 1 - \Big(P(W<Y) + P(W > X+Y) \Big).\]

We have

\[\begin{align*} P(W < Y=y) &= F_W(y) = \int_0^{y}f_W(w)dw. \\ P(W < Y) &= \int^4_0 f_Y(y) F_W(y) dy. \tag{2} \end{align*}\]

\(Y\) is uniformly distributed between \(0\) and \(4\).

Remark

記錄這題的重點有二:

  1. 善用 total expectation theorem;不只在大量 summation。
  2. 隨機變數間直接比較的計算方法。一般化如下:
\[\begin{align*} P(X<Y) &= \int_Y F_X(y)f_Y(y)dy \\ &= \int_Y \Big(\int_{-\infty}^y f_X(x)dx\Big)f_Y(y)dy \\ &= \int_Y\int_{-\infty}^y f_X(x)f_Y(y)dxdy \end{align*}\]

Reference