More about Data Science and Statistics
- Linear Correlation
- T-test
- Permutation tests
- Bootstrapping
- ANOVA
- Kolmogorov-Smirnov Test
- Cramer-Von Mises Test
Introduction¶
The t-test, also known as Student's t-test, is a statistical test used to analyze whether two samples come from populations with the same mean. To do this, it quantifies the difference between the means of the two samples and, taking into account their variance, estimates how likely it is to obtain a difference equal to or greater than the one observed if the null hypothesis that the population means are equal were true. The probability estimated by the test is known as the p-value.
A p-value greater than a certain threshold, for example 5% or 1%, indicates that the observed difference may be due to chance, so the null hypothesis is not rejected. Conversely, when the p-value is lower than the selected threshold, it is considered that there is sufficient evidence to reject that the samples come from populations with the same mean.
When two samples are available, the fact that their average values are not exactly equal does not imply that there is evidence of a real difference. Since each sample has its own variability due to random sampling, even if they come from the same population, the sample means do not have to be equal. This is where the t-test adds value. It is a statistical test to compare the mean between two samples.
There are multiple adaptations of the t-test depending on whether the data are independent or paired, whether the variance is the same in both samples, or what type of differences are to be detected. This document shows how to use the implementations available in the Pingouin library to perform t-tests in Python.
T-test for independent samples¶
Two samples are considered independent if the observations were obtained randomly and are not related to each other.
Hypotheses tested¶
The hypotheses tested by the independent samples t-test are:
$H_0$: there is no difference between the means: $\mu_x = \mu_y$
$H_a$: there is a difference between the means: $\mu_x \neq \mu_y$
T statistic¶
The statistic used by the independent samples t-test is calculated as:
$$t = \frac{\overline{x} - \overline{y}} {\sqrt{\frac{s^{2}_{x}}{n_{x}} + \frac{s^{2}_{y}}{n_{y}}}}$$where $\overline{x}$ and $\overline{y}$ are the sample means, $n_x$ and $n_y$ are the number of observations in each sample, $s^{2}_{x}$ and $s^{2}_{y}$ are the variances of each sample.
The t statistic follows a distribution known as the Student's t-distribution. This distribution closely resembles the normal distribution, it has the mean and variance as parameters, and additionally, through the degrees of freedom, it allows flexibilizing the tails according to the sample size. As the sample size decreases, the probability accumulated in the tails increases, thus being less strict than would be expected in a normal distribution. A Student's t-distribution with 30 or more degrees of freedom is practically equal to a normal distribution.
There are several ways to calculate the degrees of freedom ($v$) of the Student's t-distribution. Two of the most commonly used are:
$$v = n_x + n_y - 2$$when the sample sizes are the same, and
$$v = \frac{(\frac{s^{2}_{x}}{n_{x}} + \frac{s^{2}_{y}}{n_{y}})^{2}} {\frac{(\frac{s^{2}_{x}}{n_{x}})^{2}}{(n_{x}-1)} + \frac{(\frac{s^{2}_{y}}{n_{y}})^{2}}{(n_{y}-1)}}$$when the sizes are not equal (unbalanced). The latter is known as the Welch–Satterthwaite correction.
Conditions for an independent samples t-test¶
The conditions for an independent samples t-test, both for calculating confidence intervals and for hypothesis testing, are:
Independence
The observations must be independent of each other. For this, sampling must be random and the sample size must be less than 10% of the population.
Normality
The populations being compared must be normally distributed. Although the normality condition falls on the populations, information about them is usually not available, so the samples (since they reflect the population) must be approximately normally distributed. In case of some asymmetry, t-tests are considerably robust when the sample size is equal to or greater than 30.
Equality of variance (homoscedasticity)
The variance of both compared populations must be equal. As with the normality condition, if population information is not available, this condition must be assumed from the samples. If this condition is not met, Welch's correction can be used. This correction is incorporated through the degrees of freedom and allows compensating for the difference in variances. The number of degrees of freedom of a Welch Two Sample t-test is given by the following function:
$$f=\frac{(\frac{\widehat{S}^{2}_{x}}{n_x} + \frac{\widehat{S}^{2}_{y}}{n_y})^2} {\frac{1}{n_x+1}(\frac{\widehat{S}^{2}_{x}}{n_x})^2 + \frac{1}{n_y+1}(\frac{\widehat{S}^{2}_{y}}{n_y})^2} - 2$$If the above conditions are met, the estimated parameter, in this case the difference of sample means ($\overline{x}$ - $\overline{y}$), follows a Student's t-distribution (degrees of freedom, mean = estimated parameter, sd = SE)
The standard error (SE) of a Student's t-distribution for comparing means is defined as the square root of the sum of the variances, divided by the size of each sample.
$$SE=\sqrt{\frac{\widehat{S}^{2}_{x}}{n_x} + \frac{\widehat{S}^{2}_{y}}{n_y}}$$The process to follow for calculating confidence intervals or hypothesis tests is the same as that followed in the Normal model. The only difference is that, instead of using Z-scores (quantiles of the normal distribution), T-scores (quantiles of the t-student distribution) are used.
Hypothesis testing¶
The steps to follow to perform a t-test with the aim of determining whether the difference between means is significant are:
Establish the hypotheses.
Calculate the statistic (estimated parameter) to be used.
Determine the type of test, one or two tails.
Determine the significance level $\alpha$.
Calculate p-value and compare it with the established significance level.
Calculate effect size (optional but recommended).
Conclusions.
Establish the hypotheses
The null hypothesis ($H_0$) is the skeptical hypothesis, the one that considers that there is no difference or change. In the case of comparing two independent means, the null hypothesis considers that $\mu_1 =\mu_2$.
The alternative hypothesis ($H_a$) considers that the null hypothesis is not met. In the case of comparing two independent means, the alternative hypothesis considers that $\mu_1 \neq \mu_2$.
Calculate the statistic
The statistic is the value calculated from the sample that is to be extrapolated to the source population. In this case, the difference of means.
Determine the type of test, one or two tails
Hypothesis tests can be one-tailed or two-tailed. If the alternative hypothesis uses ">" or "<" it is a one-tailed test, in which only deviations in one direction are analyzed. If the alternative hypothesis is of the "different from" type, it is a two-tailed test, in which possible deviations in both directions are analyzed. It is recommended to use one-tailed tests only when it is known with certainty that the deviations of interest are in one direction and only if it has been determined before observing the sample, not afterwards.
Determine the significance level $\alpha$
The significance level $\alpha$ determines the probability of error that one wants to assume when rejecting the null hypothesis. It is used as a reference point to determine whether the p-value obtained in the hypothesis test is low enough to consider the observed differences significant and, therefore, reject $H_0$. The lower the value of $\alpha$, the lower the probability of rejecting the null hypothesis. For example, if $\alpha = 0.05$ is considered, the null hypothesis will be rejected in favor of the alternative hypothesis if the p-value obtained is less than 0.05, and there will be a 5% probability of having rejected $H_0$ when it is actually true. The significance level should be established based on which error is more costly:
Type I error: error of rejecting the null hypothesis when it is actually true
Type II error: error of considering the null hypothesis as true when it is actually false.
Calculate p-value and compare with the significance level
Using a Student's t-distribution, calculate the probability of obtaining a value of the t statistic greater than or equal to the one observed.
Effect size
Effect size is the net difference observed between the groups in a study. It is not a statistical inference as it does not attempt to identify whether the populations are significantly different, but simply indicates the observed difference between samples, regardless of the variance they have. It is a parameter that should always accompany p-values, since a p-value only indicates whether there is significant evidence to reject the null hypothesis but says nothing about whether the difference is important or practical. The latter is determined by the effect size.
In the case of the independent means t-test, there are two possible measures of effect size: Cohen's d and Pearson's r. Both are equivalent and can be transformed from one to the other. Each of these measures has recommended magnitudes to consider the effect size as small, medium, or large.
Cohen's d
$$d= \frac{\text{|difference of means|}}{sd}$$There are two different ways used to calculate the pooled sd of both samples:
$$sd=\sqrt{\frac{n_x sd_x^2 + n_y sd_y^2}{n_x+n_y-2}}$$$$sd=\sqrt{\frac{sd_x^2+sd_y^2}{n_x+n_y}}$$The most commonly used limits to classify effect size with Cohen's d are:
d $\leq$ 0.2 small
d $\geq$ 0.5 medium
d $=$ 0.8 large
Pearson's r
$$r= \sqrt{\frac{t^2}{t^2 + df}}$$t = t statistic obtained in the test
df = degrees of freedom of the test
The most commonly used limits to classify effect size with r are:
r $\leq$ 0.1 small
r $\geq$ 0.3 medium
r $=$ 0.5 large
Interpretation of results
If the p-value is less than the selected $\alpha$ value, there is sufficient evidence to reject $H_0$ in favor of $H_a$
Confidence interval¶
The confidence interval for the difference of independent means for a confidence level of 1−$\alpha$ has the following structure:
$$[(\overline{x} - \overline{y}) \pm t_{df, 1-\alpha/2} * \sqrt{\frac{\widehat{S}^{2}_{x}}{n_x} + \frac{\widehat{S}^{2}_{y}}{n_y}}]$$The value $t$ depends on the confidence percentage of the confidence interval to be obtained. It is defined as the value (quantile) for which in a Student's distribution, with certain degrees of freedom, a percentage equal to the confidence percentage of the interval is contained between the value $-t$ and the value $+t$.
The value $t$ can be found in tabulated tables or through computer programs. In Python, the value $t$ for a given confidence interval and degrees of freedom can be obtained with the scipy.stats.t.ppf function. For example t.ppf(q=0.95 + 0.05/2, df=15, loc=0, scale=1).
Example¶
The births dataset from the R package openintro contains information about 150 births as well as information about the mothers. We want to determine whether there is significant evidence that the weight of newborns whose mothers smoke (smoker) differs from those whose mothers do not smoke (nonsmoker).
Libraries¶
The libraries used in this example are:
# Libraries
# ==============================================================================
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import stats
import pingouin as pg
Data¶
# Download data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/' +
'Estadistica-machine-learning-python/master/data/births.csv')
data = pd.read_csv(url, sep=',')
data.head(4)
| f_age | m_age | weeks | premature | visits | gained | weight | sex_baby | smoke | |
|---|---|---|---|---|---|---|---|---|---|
| 0 | 31.0 | 30 | 39 | full term | 13.0 | 1.0 | 6.88 | male | smoker |
| 1 | 34.0 | 36 | 39 | full term | 5.0 | 35.0 | 7.69 | male | nonsmoker |
| 2 | 36.0 | 35 | 40 | full term | 12.0 | 29.0 | 8.88 | male | nonsmoker |
| 3 | 41.0 | 40 | 40 | full term | 13.0 | 30.0 | 9.00 | female | nonsmoker |
Hypotheses¶
$H_0$: there is no difference between the population means: $\mu(smoker) - \mu(nonsmoker) = 0$
$H_a$: there is a difference between the population means: $\mu(smoker) - \mu(nonsmoker) \neq 0$
Conditions¶
Independence
This is a random sampling where the sample sizes do not exceed 10% of all births in North Carolina. It can be stated that the events are independent.
Normality
# Number of observations per group
# ==============================================================================
data.groupby('smoke').size()
smoke nonsmoker 100 smoker 50 dtype: int64
# Distribution plots
# ==============================================================================
fig, axs = plt.subplots(2, 2, figsize=(8, 5))
weight_smokers = data.loc[data.smoke == 'smoker', 'weight']
weight_nonsmokers = data.loc[data.smoke == 'nonsmoker', 'weight']
# Smokers
sns.histplot(
weight_smokers,
kde = True,
stat = 'density',
bins = 20,
color = "#3182bd",
alpha = 0.5,
ax = axs[0, 0]
)
mu, sigma = stats.norm.fit(weight_smokers)
x_hat = np.linspace(weight_smokers.min(), weight_smokers.max(), 100)
axs[0, 0].plot(x_hat, stats.norm.pdf(x_hat, mu, sigma), 'r-', linewidth=2, label='normal')
axs[0, 0].set_title('Weight distribution (smokers)')
axs[0, 0].set_ylabel('Probability density')
axs[0, 0].legend()
# Nonsmokers
sns.histplot(
weight_nonsmokers,
kde = True,
stat = 'density',
bins = 20,
color = "#3182bd",
alpha = 0.5,
ax = axs[0, 1]
)
mu, sigma = stats.norm.fit(weight_nonsmokers)
x_hat = np.linspace(weight_nonsmokers.min(), weight_nonsmokers.max(), 100)
axs[0, 1].plot(x_hat, stats.norm.pdf(x_hat, mu, sigma), 'r-', linewidth=2, label='normal')
axs[0, 1].set_title('Weight distribution (nonsmokers)')
axs[0, 1].set_ylabel('Probability density')
axs[0, 1].legend()
# QQ-plots
pg.qqplot(weight_smokers, dist='norm', ax=axs[1, 0])
pg.qqplot(weight_nonsmokers, dist='norm', ax=axs[1, 1])
plt.tight_layout()
# Shapiro-Wilk normality test
# ==============================================================================
pg.normality(data=data, dv='weight', group='smoke')
| W | pval | normal | |
|---|---|---|---|
| smoke | |||
| smoker | 0.894906 | 0.000328 | False |
| nonsmoker | 0.923736 | 0.000022 | False |
The quantile-quantile plots show left skewness and the Shapiro-Wilk test finds significant evidence that the data do not come from populations with a normal distribution. However, since the size of each group is greater than 30, it can be considered that the t-test is still sufficiently robust, although this needs to be mentioned in the conclusions. An alternative would be to resort to a non-parametric test based on the median (Mann-Whitney-Wilcoxon test) or a Bootstrapping test would be more appropriate. Another option is to study whether the anomalous data are exceptions that can be excluded from the analysis.
Equality of variance (homoscedasticity)
There are several tests that allow comparing variances. Since the normality criterion is not met, one of the recommended tests is Levene's test.
fig, ax = plt.subplots(1, 1, figsize=(6, 4))
sns.boxplot(y="smoke", x="weight", data=data, ax=ax)
sns.swarmplot(y="smoke", x="weight", data=data, color='black', alpha=0.5, ax=ax)
ax.set_title('Weight comparison by mother smoking habit', fontsize=11, pad=12)
ax.set_xlabel('Weight (pounds)')
ax.set_ylabel('Smoking habit');
# Homoscedasticity test
# ==============================================================================
pg.homoscedasticity(data=data, dv='weight', group='smoke')
| W | pval | equal_var | |
|---|---|---|---|
| levene | 0.444176 | 0.506151 | True |
No significant evidence (for α = 0.05) is found that the variances are different between both populations. If they were, the t-test would have to be performed with Welch's correction.
T-Test¶
The ttest function from the Pingouin package calculates the p-value, confidence intervals, and effect size.
# Test for independent data (p-value, confidence intervals)
# ==============================================================================
weight_smokers = data.loc[data.smoke == 'smoker', 'weight']
weight_nonsmokers = data.loc[data.smoke == 'nonsmoker', 'weight']
pg.ttest(x=weight_smokers, y=weight_nonsmokers, alternative='two-sided', correction=False)
| T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
|---|---|---|---|---|---|---|---|---|
| T-test | -1.551676 | 148 | two-sided | 0.122876 | [-0.91, 0.11] | 0.268758 | 0.553 | 0.338075 |
Conclusion¶
Since the p-value (0.1229) is greater than the significance level α (0.05), there is not sufficient evidence to consider that there is a real difference between the average weight of children born to smoking mothers and that of non-smoking mothers. The effect size measured by Cohen's d is small (0.27).
T-test for dependent (paired) samples¶
Two means are dependent or paired when they come from dependent samples, that is, when there is a relationship between the observations of the samples. This scenario appears when the results are generated from the same individuals under two different conditions. Two examples:
Testing the results of two types of exams (reading and writing) on a group of school students, where each student takes both exams.
Medical studies comparing a pre-treatment and post-treatment characteristic on the same individuals.
To determine whether the individuals (observations) have undergone a significant difference between conditions $A$ and $B$, the change in the studied magnitude $d_i=A_i−B_i$ is calculated for each of them. Although there may be no real difference between the two conditions, when calculating the difference between the before and after of each individual, the value will probably not be exactly zero, however, the average of all differences will tend to zero (compensation of deviations). It is this average ($\mu_{difference}$) that is studied through the sample statistic $\overline{d} = \frac{\sum_{i=1}^n (x_{before} - x_{after})}{n}$, determining whether the observed average of differences deviates sufficiently from zero to accept that the mean value of both groups is not the same.
Dependent or paired tests have the advantage over independent ones that non-systematic variation (that produced by variables not contemplated in the study) can be better controlled since they are blocked by examining the same individuals twice, not two different groups of individuals.
Conditions for a dependent samples t-test¶
Normality
The populations being compared must be normally distributed. While normality must be met in the populations, when information about them is not available, the only way to estimate their distribution is from the samples.
Variance
It is not necessary for the variances of both groups to be equal (homoscedasticity not required).
If the mentioned conditions are met, it can be considered that:
$$\overline{d} \sim N(\overline{d}, \widehat{S}^2_{d})$$Hypotheses¶
The hypotheses tested by the paired samples t-test are:
$H_0$: there is no difference between the means, the average of the differences is 0 ($\mu_d = 0$) or it is a determined value ($\Delta$).
$H_a$: there is a difference between the variables, ($\mu_d \neq 0$) or the difference is different from the value established in the null hypothesis ($\mu_d \neq \Delta$).
The rest of the process is the same as that followed in a t-test for independent means:
$$T_{calculated}= \frac{\overline{d} - H_0 value}{SE}$$The SE of the average of differences being equal to the sample quasi-standard deviation divided by the square root of the number of data pairs:
$$\frac{\widehat{S}_{d}}{\sqrt{n}}$$Example¶
An athletics team has hired a new coach. To evaluate their impact, 10 team members are randomly selected and their times in the 100-meter sprint are recorded at the beginning of the year. At the end of the year, these same 10 runners are timed again. The goal is to determine whether there is a significant difference in the athletes' performance after a year of training, that is, whether the average time has changed (regardless of whether it has improved or worsened).
This is a case study where measurements are made on the same individuals under two different conditions (beginning and end of the year), so these are paired data.
Libraries¶
The libraries used in this example are:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import pingouin as pg
Data¶
data = pd.DataFrame({
'runner': range(1, 11),
'before': [12.9, 13.5, 12.8, 15.6, 17.2, 19.2, 12.6, 15.3, 14.4, 11.3],
'after': [12.7, 13.6, 12.0, 15.2, 16.8, 20.0, 12.0, 15.9, 16.0, 11.1]
})
data.head()
| runner | before | after | |
|---|---|---|---|
| 0 | 1 | 12.9 | 12.7 |
| 1 | 2 | 13.5 | 13.6 |
| 2 | 3 | 12.8 | 12.0 |
| 3 | 4 | 15.6 | 15.2 |
| 4 | 5 | 17.2 | 16.8 |
Since this is paired data, it is useful to know the difference for each individual.
data['difference'] = data['before'] - data['after']
data.head()
| runner | before | after | difference | |
|---|---|---|---|---|
| 0 | 1 | 12.9 | 12.7 | 0.2 |
| 1 | 2 | 13.5 | 13.6 | -0.1 |
| 2 | 3 | 12.8 | 12.0 | 0.8 |
| 3 | 4 | 15.6 | 15.2 | 0.4 |
| 4 | 5 | 17.2 | 16.8 | 0.4 |
data['difference'].describe()
count 10.000000 mean -0.050000 std 0.741245 min -1.600000 25% -0.475000 50% 0.200000 75% 0.400000 max 0.800000 Name: difference, dtype: float64
Hypotheses¶
$H_0$: there is no difference between the average time of runners at the beginning and end of the year. The average of the differences is zero ($\mu_d=0$).
$H_a$: there is a difference between the average time of runners at the beginning and end of the year. The average of the differences is not zero ($\mu_d \neq 0$).
Observed statistic¶
The statistic is the value calculated from the sample that is to be extrapolated to the source population. In this case, it is the average of the differences between each pair of observations $\overline{d}=-0.5$.
Determine the type of test, one or two tails¶
Hypothesis tests can be one-tailed or two-tailed. If the alternative hypothesis uses ">" or "<" it is a one-tailed test, in which only deviations in one direction are analyzed. If the alternative hypothesis is of the "different from" type, it is a two-tailed test, in which possible deviations in both directions are analyzed. One-tailed tests are only used when it is known with certainty that the deviations of interest are in one direction and only if it has been determined before observing the sample, not afterwards.
In this case, a two-tailed test will be used since we are interested in detecting any significant change in the athletes' performance, whether it has improved or worsened.
Significance level¶
$$\alpha = 0.05$$The significance level $\alpha$ determines the probability of error that one wants to assume when rejecting the null hypothesis. It is used as a reference point to determine whether the p-value obtained in the hypothesis test is low enough to consider the observed differences significant and, therefore, reject $H_0$. The lower the value of α, the lower the probability of rejecting the null hypothesis. For example, if $\alpha = 0.05$ is considered, the null hypothesis will be rejected in favor of the alternative hypothesis if the p-value obtained is less than 0.05, and there will be a 5% probability of having rejected $H_0$ when it is actually true. The significance level should be established based on which error is more costly:
Type I error: error of rejecting $H_0$ when it is actually true.
Type II error: error of considering $H_0$ as true when it is actually false.
Conditions¶
data
| runner | before | after | difference | |
|---|---|---|---|---|
| 0 | 1 | 12.9 | 12.7 | 0.2 |
| 1 | 2 | 13.5 | 13.6 | -0.1 |
| 2 | 3 | 12.8 | 12.0 | 0.8 |
| 3 | 4 | 15.6 | 15.2 | 0.4 |
| 4 | 5 | 17.2 | 16.8 | 0.4 |
| 5 | 6 | 19.2 | 20.0 | -0.8 |
| 6 | 7 | 12.6 | 12.0 | 0.6 |
| 7 | 8 | 15.3 | 15.9 | -0.6 |
| 8 | 9 | 14.4 | 16.0 | -1.6 |
| 9 | 10 | 11.3 | 11.1 | 0.2 |
Normality
# Distribution plots
# ==============================================================================
fig, axs = plt.subplots(2, 2, figsize=(8, 5))
# Beginning of the year
sns.histplot(
data['before'],
kde = True,
stat = 'density',
bins = 20,
color = "#3182bd",
alpha = 0.5,
ax = axs[0, 0]
)
mu, sigma = stats.norm.fit(data['before'])
x_hat = np.linspace(data['before'].min(), data['before'].max(), 100)
axs[0, 0].plot(x_hat, stats.norm.pdf(x_hat, mu, sigma), 'r-', linewidth=2, label='normal')
axs[0, 0].set_title('Distribution at the beginning of the year')
axs[0, 0].set_xlabel('Time')
axs[0, 0].set_ylabel('Probability density')
axs[0, 0].legend()
# QQ-plot beginning of the year
pg.qqplot(data['before'], dist='norm', ax=axs[1, 0])
# End of the year
sns.histplot(
data['after'],
kde = True,
stat = 'density',
bins = 20,
color = "#3182bd",
alpha = 0.5,
ax = axs[0, 1]
)
mu, sigma = stats.norm.fit(data['after'])
x_hat = np.linspace(data['after'].min(), data['after'].max(), 100)
axs[0, 1].plot(x_hat, stats.norm.pdf(x_hat, mu, sigma), 'r-', linewidth=2, label='normal')
axs[0, 1].set_title('Distribution at the end of the year')
axs[0, 1].set_xlabel('Time')
axs[0, 1].set_ylabel('Probability density')
axs[0, 1].legend()
# QQ-plot end of the year
pg.qqplot(data['after'], dist='norm', ax=axs[1, 1])
plt.tight_layout()
# Shapiro-Wilk normality test
# ==============================================================================
pg.normality(data=data['before'])
| W | pval | normal | |
|---|---|---|---|
| before | 0.944436 | 0.603336 | True |
pg.normality(data=data['after'])
| W | pval | normal | |
|---|---|---|---|
| after | 0.936383 | 0.513513 | True |
The Q-Q plots indicate that the samples resemble what is expected in a normal population and the Shapiro-Wilk tests do not find evidence (α = 0.05) to reject that the samples come from normal populations.
T-test¶
# Test for dependent data (p-value, confidence intervals)
# ==============================================================================
pg.ttest(
x = data['before'],
y = data['after'],
alternative = 'two-sided',
paired = True,
correction = False
)
| T | dof | alternative | p-val | CI95% | cohen-d | BF10 | power | |
|---|---|---|---|---|---|---|---|---|
| T-test | -0.213308 | 9 | two-sided | 0.83584 | [-0.58, 0.48] | 0.019375 | 0.315 | 0.050347 |
Conclusion¶
The p-value obtained by the paired t-test is higher than the established significance level ($\alpha = 0.05$), so there is no evidence to reject the null hypothesis in favor of the alternative hypothesis. It cannot be considered that the athletes' performance has changed.
Session information¶
import session_info
session_info.show(html=False)
----- matplotlib 3.10.8 numpy 2.3.4 pandas 2.3.3 pingouin 0.5.5 scipy 1.15.3 seaborn 0.13.2 session_info v1.0.1 ----- IPython 9.7.0 jupyter_client 8.6.3 jupyter_core 5.9.1 ----- Python 3.13.9 | packaged by conda-forge | (main, Oct 22 2025, 23:12:41) [MSC v.1944 64 bit (AMD64)] Windows-11-10.0.26100-SP0 ----- Session information updated at 2025-12-28 22:13
Bibliography¶
OpenIntro Statistics: Fourth Edition by David Diez, Mine Çetinkaya-Rundel, Christopher Barr book
Statistics Using R with Biological Examples by Kim Seefeld, Ernst Linder
Handbook of Biological Statistics by John H. McDonald
Statistical Methods in Engineering (Métodos estadísticos en ingeniería) by Rafael Romero Villafranca, Luisa Rosa Zúnica Ramajo
Citation instructions¶
How to cite this document?
If you use this document or any part of it, we appreciate you citing it. Thank you very much!
T-test with Python by Joaquín Amat Rodrigo, available under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) license at https://www.cienciadedatos.net/documentos/pystats10-t-test-python-en.html
Did you like the article? Your help is important
Your contribution will help me continue generating free educational content. Thank you very much! 😊
This document created by Joaquín Amat Rodrigo is licensed under Attribution-NonCommercial-ShareAlike 4.0 International.
You are free to:
-
Share: copy and redistribute the material in any medium or format.
-
Adapt: remix, transform, and build upon the material.
Under the following terms:
-
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial: You may not use the material for commercial purposes.
-
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
