📌

母分散が既知の場合の母平均の信頼区間95%を確かめる

2024/05/05に公開2件

信頼区間95%とは、100回推定した場合、推定区間の中に母平均が95回含まれることを意味しています。
標本サイズ5で10000回推定し、母平均が何％含まれているか確かめてみます。

母集団は、次の通りです。標本数は34、母平均は3811.8、母標準偏差は2.46です。

5652 2187 592 265 13 435 3842 31323 4 500 3842 31323 4 500 352 7 229 284 4 613 883 1556 90 16440 774 2164 776 155 330 10867 4913 2178 16 6488

ソースコード

using Plots
using StatsBase

x = [
    5652 2187 592 265 13 435 3842 31323 4 500 3842 31323 4 500 352 7 229 284 4 613 883 1556 90 16440 774 2164 776 155 330 10867 4913 2178 16 6488
]

sample_means = Int64[]

for i ∈ 1:10000
    push!(sample_means, rand(x, 5) |> mean |> x -> round(Int, x, RoundNearestTiesUp))
end


histogram(
    sample_means,
    title="sample mean", label="", xlabel="sample mean", ylabel="frequency",
    linecolor="#36472F", linewidth=2, fill_color="#DC4BF4", fillalpha=0.2,
    xlims=(0, maximum(sample_means))
    )

savefig("sample_mean.png")

hit = 0

for sample_mean ∈ sample_means
    if sample_mean - 1.96*2.46/√5 ≤ 3811.8 ≤ sample_mean + 1.96*2.46/√5
        global hit += 1
    end
end

@show hit / 10000

標本平均のヒストグラム

母平均が推定区間に含まれる割合

今回は2%でした。どこか計算を間違えているようなので、後で見直します。。。

Discussion

清水団

記事を読みました。今，ちょうど同じような実験をしたので，共有します。
まず，「母集団は、次の通りです。標本数は34、母平均は3811.8、母標準偏差は2.46です。」とありますが，母標準偏差が違うと思います。7794.3くらいですね。

N = length(X) = 34
μ = mean(X) = 3811.794117647059
σ = std(X, corrected = true) = 7794.341808584947

これでコードを回すと，

using Plots
using StatsBase

x = [
    5652 2187 592 265 13 435 3842 31323 4 500 3842 31323 4 500 352 7 229 284 4 613 883 1556 90 16440 774 2164 776 155 330 10867 4913 2178 16 6488
]

sample_means = Int64[]

for i ∈ 1:10000
    push!(sample_means, rand(x, 5) |> mean |> x -> round(Int, x, RoundNearestTiesUp))
end


histogram(
    sample_means,
    title="sample mean", label="", xlabel="sample mean", ylabel="frequency",
    linecolor="#36472F", linewidth=2, fill_color="#DC4BF4", fillalpha=0.2,
    xlims=(0, maximum(sample_means))
    )

savefig("sample_mean.png")

hit = 0

for sample_mean ∈ sample_means
    if sample_mean - 1.96*7794.341808584947/√5 ≤ 3811.8 ≤ sample_mean + 1.96*7794.341808584947/√5
        global hit += 1
    end
end

@show hit / 10000

hit / 10000 = 0.956

と95％くらいになりますね。

清水団

後で，Xにポストしようと思うのですが，

X̄ = 標本平均，σ = 母標準偏差として正規分布N( X̄ , σ²/n) で信頼区間を求めると95%くらいになっていい感じなのですが，

X̄ = 標本平均，σ̄ = 標本標準偏差として正規分布N( X̄ , σ̄²/n) で信頼区間を求めると60%くらいになってしまいます。

using Distributions

x = [
    5652 2187 592 265 13 435 3842 31323 4 500 3842 31323 4 500 352 7 229 284 4 613 883 1556 90 16440 774 2164 776 155 330 10867 4913 2178 16 6488
]

sample_means = Int64[]

sample_std = std(x,corrected=true)

for i ∈ 1:10000
    push!(sample_means, rand(x, 5) |> mean |> x -> round(Int, x, RoundNearestTiesUp))
end


hit = 0

for _ =1:10000
    X = rand(x,5)
    sample_mean = mean(X)
    sample_std = std(X,corrected=true)
    if sample_mean - 1.96*sample_std/√5 ≤ 3811.8 ≤ sample_mean + 1.96*sample_std/√5
        global hit += 1
    end
end

@show hit / 10000

hit / 10000 = 0.6006

実験回数（10^6回）を増やしても60%は変わりませんでした。
標本の大きさを5から100（元々データが34しかないのにサンプル数を100というのもなんですが，，，）にすると92%くらいになりました。
まだよくわかりませんが，
- 「母標準偏差がわかっていたら，標本標準偏差ではなく，母標準偏差を使った方がいい」
- 「標本標準偏差を使うと，標本の大きさnが小さいと，実験回数を増やしても95%に近い数字ならない」

こんな感じですかね。統計，難しいですね。