🦔
【Statistics Method】t-test explained

2024/09/14に公開
 1. what is t-testA t-test is a type of hypothesis test in statistics, and is mainly used to determine whether there is a significant difference in the mean values ​​between two groups. A t-test is particularly useful when the sample size is small or the standard deviation of the population is unknown.

 2. Basic concepts of t-testsA t-test is used to verify hypotheses such as the following:
Null hypothesis (H₀): There is no difference in the mean values ​​of the two groups.
Alternative hypothesis (H₁): There is a difference in the mean values ​​of the two groups.
In a t-test, the t-value calculated from the sample is used to determine whether to reject the null hypothesis. If the t-value exceeds a certain critical value, the null hypothesis is rejected and it is concluded that there is a significant difference between the groups.

 3. Types of t-testsThere are three main types of t-tests:

 3.1 One-sample t-testObjective: Test whether the mean of one sample is different from a known population mean.
Example: Test whether the average weight of a product is different from a standard value.
・Example Code
# One-sample t-test
import numpy as np
from scipy import stats

# Sample data: weights of 30 products
sample_weights = np.array([
    49.8, 50.2, 50.0, 49.5, 50.1, 50.3, 49.9, 50.4, 50.2, 49.7,
    50.0, 50.1, 49.6, 50.3, 50.2, 49.8, 50.0, 50.1, 49.9, 50.2,
    50.0, 49.7, 50.3, 50.1, 49.8, 50.0, 50.2, 49.9, 50.1, 50.0
])

# Population mean
mu = 50.0

# Perform one-sample t-test
t_stat, p_value = stats.ttest_1samp(sample_weights, mu)

print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")






 3.2 Independent two-sample t-testObjective: Test whether the means of two independent groups are different.
Example: Test whether there is a difference in the average height of men and women.
Assumptions:

・The data for each group are independent.

・The data follow a normal distribution.

・Equal variances (equal variances between groups) may or may not be assumed (tests for equal variances are required).
・Example Code
# Independent two-sample t-test
import numpy as np
from scipy import stats

# Sample data: heights of 30 males and 30 females
male_heights = np.array([
    175, 180, 178, 182, 176, 179, 181, 177, 183, 175,
    180, 178, 182, 176, 179, 181, 177, 183, 175, 180,
    178, 182, 176, 179, 181, 177, 183, 175, 180, 178
])

female_heights = np.array([
    165, 160, 162, 158, 161, 159, 163, 160, 162, 161,
    165, 160, 162, 158, 161, 159, 163, 160, 162, 161,
    165, 160, 162, 158, 161, 159, 163, 160, 162, 161
])

# Perform Levene's test for equal variances
levene_stat, levene_p = stats.levene(male_heights, female_heights)
print(f"Levene's test p-value: {levene_p:.4f}")

# Decide whether to assume equal variances
if levene_p > 0.05:
    equal_var = True
    print("Equal variances assumed.")
else:
    equal_var = False
    print("Equal variances not assumed. Using Welch's t-test.")

# Perform independent two-sample t-test
t_stat, p_value = stats.ttest_ind(male_heights, female_heights, equal_var=equal_var)

print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
Checks for equality of variances. If p-value > 0.05, assume equal variances and use the standard independent t-test. Otherwise, use Welch's t-test which does not assume equal variances.
The results are below, and the test shows no differences.
Levene's test p-value: 0.0397
Equal variances not assumed. Using Welch's t-test.
t-statistic: 29.8520
p-value: 0.0000

 3.3 Paired t-testObjective: Test whether the means of two related measurements for the same subject are different.
Example: Test the change in weight before and after a diet.
Assumptions:

・Paired data follows a normal distribution.
・Example Code
# Paired t-test
import numpy as np
from scipy import stats

# Sample data: weights before and after a diet program for 30 individuals
before_weights = np.array([
    80, 82, 78, 85, 77, 79, 81, 80, 83, 78,
    80, 82, 78, 85, 77, 79, 81, 80, 83, 78,
    80, 82, 78, 85, 77, 79, 81, 80, 83, 78
])

after_weights = np.array([
    78, 80, 76, 83, 75, 77, 79, 78, 81, 76,
    78, 80, 76, 83, 75, 77, 79, 78, 81, 76,
    78, 80, 76, 83, 75, 77, 79, 78, 81, 76
])

# Perform paired t-test
t_stat, p_value = stats.ttest_rel(before_weights, after_weights)

print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
A p-value ≤ 0.05 suggests a significant change in weights due to the diet program.

 4. Procedure for t-testSetting hypotheses:

Clarify the null hypothesis and alternative hypothesis.
Setting the significance level:

Generally, 5% (0.05) is used.
Collecting data and checking assumptions:

Perform a normality test (e.g., Shapiro-Wilk test) or a homogeneity of variance test (e.g., F test).
Calculating t-value:

Calculate based on the formula for one-sample t-test, two-independent sample t-test, and paired t-test.
Calculating and comparing p-value:

Calculate the p-value based on the calculated t-value and compare it with the significance level.
Drawing conclusions:

If the p-value ≤ significance level, reject the null hypothesis and accept the alternative hypothesis.

If the p-value > significance level, accept the null hypothesis.

 5. NotesChecking normality:

t-tests assume that the data follow a normal distribution. Nonparametric tests (e.g., Mann-Whitney U test) should be considered if normality is not met.
Assumption of homogeneity of variance:

In two-sample independent t-tests, homogeneity of variance may or may not be assumed. If homogeneity of variance does not hold, the Welch t-test is generally used.
Sample size:

Small sample sizes may reduce the power of t-tests. It is important to ensure a sufficient sample size.

 6. Summaryt-Tests are powerful statistical tools for comparing means and making inferences about populations based on sample data. By selecting the appropriate type of t-test and ensuring that the underlying assumptions are met, you can draw reliable conclusions about your data.
1. what is t-test

2. Basic concepts of t-tests

3. Types of t-tests

3.1 One-sample t-test

3.2 Independent two-sample t-test

3.3 Paired t-test

4. Procedure for t-test

5. Notes

6. Summary

Discussion