【Statistics Method】t-test explained
1. what is t-test
A t-test is a type of hypothesis test in statistics, and is mainly used to determine whether there is a significant difference in the mean values between two groups. A t-test is particularly useful when the sample size is small or the standard deviation of the population is unknown.
2. Basic concepts of t-tests
A t-test is used to verify hypotheses such as the following:
-
Null hypothesis (H₀): There is no difference in the mean values of the two groups.
-
Alternative hypothesis (H₁): There is a difference in the mean values of the two groups.
In a t-test, the t-value calculated from the sample is used to determine whether to reject the null hypothesis. If the t-value exceeds a certain critical value, the null hypothesis is rejected and it is concluded that there is a significant difference between the groups.
3. Types of t-tests
There are three main types of t-tests:
3.1 One-sample t-test
-
Objective: Test whether the mean of one sample is different from a known population mean.
-
Example: Test whether the average weight of a product is different from a standard value.
・Example Code
# One-sample t-test
import numpy as np
from scipy import stats
# Sample data: weights of 30 products
sample_weights = np.array([
49.8, 50.2, 50.0, 49.5, 50.1, 50.3, 49.9, 50.4, 50.2, 49.7,
50.0, 50.1, 49.6, 50.3, 50.2, 49.8, 50.0, 50.1, 49.9, 50.2,
50.0, 49.7, 50.3, 50.1, 49.8, 50.0, 50.2, 49.9, 50.1, 50.0
])
# Population mean
mu = 50.0
# Perform one-sample t-test
t_stat, p_value = stats.ttest_1samp(sample_weights, mu)
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
3.2 Independent two-sample t-test
-
Objective: Test whether the means of two independent groups are different.
-
Example: Test whether there is a difference in the average height of men and women.
-
Assumptions:
・The data for each group are independent.
・The data follow a normal distribution.
・Equal variances (equal variances between groups) may or may not be assumed (tests for equal variances are required).
・Example Code
# Independent two-sample t-test
import numpy as np
from scipy import stats
# Sample data: heights of 30 males and 30 females
male_heights = np.array([
175, 180, 178, 182, 176, 179, 181, 177, 183, 175,
180, 178, 182, 176, 179, 181, 177, 183, 175, 180,
178, 182, 176, 179, 181, 177, 183, 175, 180, 178
])
female_heights = np.array([
165, 160, 162, 158, 161, 159, 163, 160, 162, 161,
165, 160, 162, 158, 161, 159, 163, 160, 162, 161,
165, 160, 162, 158, 161, 159, 163, 160, 162, 161
])
# Perform Levene's test for equal variances
levene_stat, levene_p = stats.levene(male_heights, female_heights)
print(f"Levene's test p-value: {levene_p:.4f}")
# Decide whether to assume equal variances
if levene_p > 0.05:
equal_var = True
print("Equal variances assumed.")
else:
equal_var = False
print("Equal variances not assumed. Using Welch's t-test.")
# Perform independent two-sample t-test
t_stat, p_value = stats.ttest_ind(male_heights, female_heights, equal_var=equal_var)
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
Checks for equality of variances. If p-value > 0.05, assume equal variances and use the standard independent t-test. Otherwise, use Welch's t-test which does not assume equal variances.
The results are below, and the test shows no differences.
Levene's test p-value: 0.0397
Equal variances not assumed. Using Welch's t-test.
t-statistic: 29.8520
p-value: 0.0000
3.3 Paired t-test
-
Objective: Test whether the means of two related measurements for the same subject are different.
-
Example: Test the change in weight before and after a diet.
-
Assumptions:
・Paired data follows a normal distribution.
・Example Code
# Paired t-test
import numpy as np
from scipy import stats
# Sample data: weights before and after a diet program for 30 individuals
before_weights = np.array([
80, 82, 78, 85, 77, 79, 81, 80, 83, 78,
80, 82, 78, 85, 77, 79, 81, 80, 83, 78,
80, 82, 78, 85, 77, 79, 81, 80, 83, 78
])
after_weights = np.array([
78, 80, 76, 83, 75, 77, 79, 78, 81, 76,
78, 80, 76, 83, 75, 77, 79, 78, 81, 76,
78, 80, 76, 83, 75, 77, 79, 78, 81, 76
])
# Perform paired t-test
t_stat, p_value = stats.ttest_rel(before_weights, after_weights)
print(f"t-statistic: {t_stat:.4f}")
print(f"p-value: {p_value:.4f}")
A p-value ≤ 0.05 suggests a significant change in weights due to the diet program.
4. Procedure for t-test
-
Setting hypotheses:
Clarify the null hypothesis and alternative hypothesis. -
Setting the significance level:
Generally, 5% (0.05) is used. -
Collecting data and checking assumptions:
Perform a normality test (e.g., Shapiro-Wilk test) or a homogeneity of variance test (e.g., F test). -
Calculating t-value:
Calculate based on the formula for one-sample t-test, two-independent sample t-test, and paired t-test. -
Calculating and comparing p-value:
Calculate the p-value based on the calculated t-value and compare it with the significance level. -
Drawing conclusions:
If the p-value ≤ significance level, reject the null hypothesis and accept the alternative hypothesis.
If the p-value > significance level, accept the null hypothesis.
5. Notes
Checking normality:
t-tests assume that the data follow a normal distribution. Nonparametric tests (e.g., Mann-Whitney U test) should be considered if normality is not met.
Assumption of homogeneity of variance:
In two-sample independent t-tests, homogeneity of variance may or may not be assumed. If homogeneity of variance does not hold, the Welch t-test is generally used.
Sample size:
Small sample sizes may reduce the power of t-tests. It is important to ensure a sufficient sample size.
6. Summary
t-Tests are powerful statistical tools for comparing means and making inferences about populations based on sample data. By selecting the appropriate type of t-test and ensuring that the underlying assumptions are met, you can draw reliable conclusions about your data.
Discussion