【Statistics Method】ttest explained
1. what is ttest
A ttest is a type of hypothesis test in statistics, and is mainly used to determine whether there is a significant difference in the mean values between two groups. A ttest is particularly useful when the sample size is small or the standard deviation of the population is unknown.
2. Basic concepts of ttests
A ttest is used to verify hypotheses such as the following:

Null hypothesis (H₀): There is no difference in the mean values of the two groups.

Alternative hypothesis (H₁): There is a difference in the mean values of the two groups.
In a ttest, the tvalue calculated from the sample is used to determine whether to reject the null hypothesis. If the tvalue exceeds a certain critical value, the null hypothesis is rejected and it is concluded that there is a significant difference between the groups.
3. Types of ttests
There are three main types of ttests:
3.1 Onesample ttest

Objective: Test whether the mean of one sample is different from a known population mean.

Example: Test whether the average weight of a product is different from a standard value.
・Example Code
# Onesample ttest
import numpy as np
from scipy import stats
# Sample data: weights of 30 products
sample_weights = np.array([
49.8, 50.2, 50.0, 49.5, 50.1, 50.3, 49.9, 50.4, 50.2, 49.7,
50.0, 50.1, 49.6, 50.3, 50.2, 49.8, 50.0, 50.1, 49.9, 50.2,
50.0, 49.7, 50.3, 50.1, 49.8, 50.0, 50.2, 49.9, 50.1, 50.0
])
# Population mean
mu = 50.0
# Perform onesample ttest
t_stat, p_value = stats.ttest_1samp(sample_weights, mu)
print(f"tstatistic: {t_stat:.4f}")
print(f"pvalue: {p_value:.4f}")
3.2 Independent twosample ttest

Objective: Test whether the means of two independent groups are different.

Example: Test whether there is a difference in the average height of men and women.

Assumptions:
・The data for each group are independent.
・The data follow a normal distribution.
・Equal variances (equal variances between groups) may or may not be assumed (tests for equal variances are required).
・Example Code
# Independent twosample ttest
import numpy as np
from scipy import stats
# Sample data: heights of 30 males and 30 females
male_heights = np.array([
175, 180, 178, 182, 176, 179, 181, 177, 183, 175,
180, 178, 182, 176, 179, 181, 177, 183, 175, 180,
178, 182, 176, 179, 181, 177, 183, 175, 180, 178
])
female_heights = np.array([
165, 160, 162, 158, 161, 159, 163, 160, 162, 161,
165, 160, 162, 158, 161, 159, 163, 160, 162, 161,
165, 160, 162, 158, 161, 159, 163, 160, 162, 161
])
# Perform Levene's test for equal variances
levene_stat, levene_p = stats.levene(male_heights, female_heights)
print(f"Levene's test pvalue: {levene_p:.4f}")
# Decide whether to assume equal variances
if levene_p > 0.05:
equal_var = True
print("Equal variances assumed.")
else:
equal_var = False
print("Equal variances not assumed. Using Welch's ttest.")
# Perform independent twosample ttest
t_stat, p_value = stats.ttest_ind(male_heights, female_heights, equal_var=equal_var)
print(f"tstatistic: {t_stat:.4f}")
print(f"pvalue: {p_value:.4f}")
Checks for equality of variances. If pvalue > 0.05, assume equal variances and use the standard independent ttest. Otherwise, use Welch's ttest which does not assume equal variances.
The results are below, and the test shows no differences.
Levene's test pvalue: 0.0397
Equal variances not assumed. Using Welch's ttest.
tstatistic: 29.8520
pvalue: 0.0000
3.3 Paired ttest

Objective: Test whether the means of two related measurements for the same subject are different.

Example: Test the change in weight before and after a diet.

Assumptions:
・Paired data follows a normal distribution.
・Example Code
# Paired ttest
import numpy as np
from scipy import stats
# Sample data: weights before and after a diet program for 30 individuals
before_weights = np.array([
80, 82, 78, 85, 77, 79, 81, 80, 83, 78,
80, 82, 78, 85, 77, 79, 81, 80, 83, 78,
80, 82, 78, 85, 77, 79, 81, 80, 83, 78
])
after_weights = np.array([
78, 80, 76, 83, 75, 77, 79, 78, 81, 76,
78, 80, 76, 83, 75, 77, 79, 78, 81, 76,
78, 80, 76, 83, 75, 77, 79, 78, 81, 76
])
# Perform paired ttest
t_stat, p_value = stats.ttest_rel(before_weights, after_weights)
print(f"tstatistic: {t_stat:.4f}")
print(f"pvalue: {p_value:.4f}")
A pvalue ≤ 0.05 suggests a significant change in weights due to the diet program.
4. Procedure for ttest

Setting hypotheses:
Clarify the null hypothesis and alternative hypothesis. 
Setting the significance level:
Generally, 5% (0.05) is used. 
Collecting data and checking assumptions:
Perform a normality test (e.g., ShapiroWilk test) or a homogeneity of variance test (e.g., F test). 
Calculating tvalue:
Calculate based on the formula for onesample ttest, twoindependent sample ttest, and paired ttest. 
Calculating and comparing pvalue:
Calculate the pvalue based on the calculated tvalue and compare it with the significance level. 
Drawing conclusions:
If the pvalue ≤ significance level, reject the null hypothesis and accept the alternative hypothesis.
If the pvalue > significance level, accept the null hypothesis.
5. Notes
Checking normality:
ttests assume that the data follow a normal distribution. Nonparametric tests (e.g., MannWhitney U test) should be considered if normality is not met.
Assumption of homogeneity of variance:
In twosample independent ttests, homogeneity of variance may or may not be assumed. If homogeneity of variance does not hold, the Welch ttest is generally used.
Sample size:
Small sample sizes may reduce the power of ttests. It is important to ensure a sufficient sample size.
6. Summary
tTests are powerful statistical tools for comparing means and making inferences about populations based on sample data. By selecting the appropriate type of ttest and ensuring that the underlying assumptions are met, you can draw reliable conclusions about your data.
Discussion