🐥

Pythonで実装する1群のt検定(統計的仮説検定)

2023/01/21に公開

math

統計

tech

問題

A大学入試試験の過去3000人の英語試験の結果は、平均は450点、標準偏差80点の正規分布で近似できる分布であった。今年の入学生250人の試験結果は、平均470点, 標準偏差82であった。今年の入学生と過去の受験生に、英語力の違いがあるか？

t検定

帰無仮説・対立仮説

帰無仮説 $H_0$ は、否定したい仮説なので、「英語力の違いがない」となる。即ち、 $\mu = \mu_0$
対立仮説 $H_1$ は、「英語力の違いがある」である。即ち $\mu \neq \mu_0$ である。

統計検定量: t値

統計検定量tは、 $\bar{X}$ を標本平均、 $\mu$ を母平均、 $SE$ を標準誤差、 $U$ を標準偏差とした際、下記で計算される。

t = \frac{\bar X - \mu}{SE} = \frac{\bar X - \mu}{U/\sqrt{n}}

帰無仮説が正しい場合、

\mu = \mu_0

なので、

\mu_0

を代入する

t = \frac{\bar X - \mu_0}{SE} = \frac{\bar X - \mu_0}{U/\sqrt{n}}

棄却域・p値

標本から計算されたt値を、 $t_s$ とする。
帰無仮説が正しいと仮定すると、t値は自由度 $n-1$ であるt分布に従う。
自由度が $n-1$ であるt分布における、 $\alpha\times100\%$ の点を $t_a$ とすると、棄却域は

-t_\frac{\alpha}{2} \leq |t_s|

の範囲となる。
また

P(t_s \leq X)

が確率変数Xが

t_s

以上になる確率であり、p値は

P(t_s \leq X) \times 2

である

Pythonで実装

1, 今年の入学生のデータを作成する。

import numpy as np
from scipy import stats

np.random.seed(123)
new_student = np.random.normal(loc=470, scale=np.sqrt(82), size=250)

2, t値の計算

# 標本平均
x_bar = round(np.mean(new_student), 3)
# 自由度
n = len(new_student)
df = n - 1
# 標準誤差
u = np.std(new_student, ddof=1)
se = round(u/np.sqrt(n), 3)

#t値
t_s = round((x_bar - 450)/se, 3) # >> 33.927

3, 棄却域の計算

round(stats.t.ppf(q=0.05, df=df), 3) # >> -1.651
# t値 33.927 が 棄却域 -1.651 を上回っているため、帰無仮説は棄却される。

4, p値の計算

p = stats.t.cdf(-np.abs(t_s), df=df) * 2 # >> 2.386920398491831e-95
# p値が有意水準0.05以下なので、帰無仮説は棄却される。

結論

t検定の結果、今年の入学生と過去の受験生に、「英語力の違いがある」ことが認められる。

補足

平均450点, 標準偏差80で行った場合、帰無仮説が成立することを確認しておく

np.random.seed(123)
new_student = np.random.normal(loc=450, scale=np.sqrt(80), size=250)

x_bar = np.mean(new_student)

n = len(new_student)
df = n - 1

u = np.std(new_student, ddof=1)
se = u/np.sqrt(n)

t_s = round((x_bar - 450)/se, 3)

round(stats.t.ppf(q=0.05, df=df), 3)

p = stats.t.cdf(-np.abs(t_s), df=df) * 2　
	# >>> 0.7492366775939594 有意水準0.5を超える

問題

t検定

帰無仮説・対立仮説

統計検定量: t値

棄却域・p値

Pythonで実装

結論

補足

Discussion