😊

Factorization theorem (Sufficient statistics)

2024/01/23に公開

t = T(X=x) is a sufficient statistic \iff p(X=x | \theta) \equiv p_\theta(X = x) = h(x) \, g_\theta\left( T(x)\right)

where x = \set{x_1, \dots x_N} is a set of realizations, X = \set{X_1, \dots, X_N} is a set of random variables and p(X=x) \equiv p(X_1=x_1, \dots X_N=x_N) .


Consider when \exist x, T(X=x) = t. In this case, p_\theta(X=x) = p_\theta(X=x, T=t).

(\Rightarrow)

\begin{align*} p_\theta(X=x) &= p_\theta(X=x, T=t) \\ &= p_\theta(X=x | T=t) p_\theta(T=t) \\ &= \underbrace{p(X=x | T=t)}_{h(x)} \, \underbrace{p_\theta(T=t)}_{g_\theta(T(x) = t)} \\ \end{align*}

(\Leftarrow)

\begin{align*} p_\theta(X=x|T=t) &= \frac{p_\theta(X=x, T=t)}{p_\theta(T=t)} \\ &= \frac{p_\theta(X=x)}{ \sum_{\set{x|T(X=x)=t}} p_\theta(X=x, T=t)} \\ &= \frac{p_\theta(X=x)}{ \sum_{\set{x|T(X=x)=t}} p_\theta(X=x)} \\ &= \frac{h(x) \cancel{g_\theta(t)}}{\sum_{\set{x|T(X=x)=t}} h(x) \cancel{g_\theta(t)}} \\ \end{align*}

therefore,

p_\theta(X=x|T=t) does not depend on \theta.

Discussion