🐥

Sufficient statistics

2024/01/21に公開

A statistic t=T(X) is sufficient for the parameter \theta, if p(X=x | T(X) = t) does not depend on \theta.

This means that t has the same information against the parameter \theta.


Let's consider an example.

Let X \overset{i.i.d}{\sim} \mathrm{Bern}(\theta), the random variables X = \{X_1, \dots, X_N\} , and the realizations x = \{x_1, \dots, x_N\}.

\begin{align*} p(X=x) &= p(X_1=x_1, \dots, X_N=x_N) \\ &= \prod_{n=1}^N p(X_n = x_n; \theta) \\ &= \theta^{\sum_n^N x_n} (1 - \theta)^{N - \sum_n^N x_n} \end{align*}

In this case, let T(X) = \sum_n^N x_n then $T(X) \overset{i.i.d}{\sim} \mathrm{Bin}(N, \theta) $ .

\begin{align*} p\left( T(X)=t; \theta \right) = \binom{N}{t} \theta^t (1 - \theta)^{N - t} \end{align*}
\begin{align*} p\left( X=x | T(x) = t \right) &= \frac{p\left( X=x, T(X) = t \right)}{p\left(T(x) = t\right)} \\ &= \frac{\theta^t (1 - \theta)^{N - t}}{\binom{N}{t} \theta^t (1 - \theta)^{N - t}} \\ &= \frac{1}{\binom{N}{t}} \\ \end{align*}

p\left( X=x | T(x) = t \right) does not depend on \theta, so T(X) = \sum_n^N x_n is a sufficient statistic.

Discussion