More about Data Science and Statistics
- Normality Tests
- Equality of variances
- Linear Correlation
- T-test
- ANOVA
- Permutation tests
- Bootstrapping
- Fitting probability distributions
- Kernel Density Estimation (KDE)
- Kolmogorov-Smirnov Test
- Cramer-Von Mises Test
Introduction¶
The assumption of homogeneity of variances states that the variability of the variable of interest is constant across different levels of a factor, that is, across different groups. This assumption is closely related to homoscedasticity, which in the context of linear regression models refers to the fact that the variance of the errors (residuals) of the model is constant across all predictions. When this condition is not met, we speak of heteroscedasticity.
There are various statistical tests that allow us to assess whether observations come from populations with equal variance. In all of them, the null hypothesis states that the variances are equal across groups, while the alternative hypothesis states that at least one of them differs. The tests differ mainly in the statistic and the measure of centrality used to calculate deviations:
Mean-based tests (such as Bartlett's test or the original version of Levene's test) have greater statistical power when the data follow normal distributions.
Median-based or non-parametric tests (such as the Brown-Forsythe version of Levene's test or the Fligner-Killeen test) are more robust against deviations from normality and asymmetric distributions.
In general, when it cannot be assumed with sufficient certainty that the populations follow a normal distribution, it is recommended to use robust tests that do not depend on this assumption.
Throughout this document, we show how to perform graphical analyses and how to apply the Levene, Bartlett, and Fligner-Killeen tests using Python.
Libraries¶
The libraries used in this document are:
# Data processing
# ==============================================================================
import pandas as pd
import numpy as np
# Graphics
# ==============================================================================
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1.5
plt.rcParams['font.size'] = 8
# Preprocessing and analysis
# ==============================================================================
from scipy import stats
# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
Data¶
The data used in this example have been obtained from the book Statistical Rethinking by Richard McElreath. The dataset contains information collected by Nancy Howell in the late 1960s about the !Kung San people, who live in the Kalahari Desert between Botswana, Namibia, and Angola. The objective is to identify whether there is a difference between the variance of the weight of men and women.
# Data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/' +
'Estadistica-machine-learning-python/master/data/Howell1.csv')
data = pd.read_csv(url)
print(data.info())
data.head(4)
<class 'pandas.core.frame.DataFrame'> RangeIndex: 544 entries, 0 to 543 Data columns (total 4 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 height 544 non-null float64 1 weight 544 non-null float64 2 age 544 non-null float64 3 male 544 non-null int64 dtypes: float64(3), int64(1) memory usage: 17.1 KB None
| height | weight | age | male | |
|---|---|---|---|---|
| 0 | 151.765 | 47.825606 | 63.0 | 1 |
| 1 | 139.700 | 36.485807 | 63.0 | 0 |
| 2 | 136.525 | 31.864838 | 65.0 | 0 |
| 3 | 156.845 | 53.041914 | 41.0 | 1 |
From all available data, only individuals over 15 years of age are selected.
# Data separation by group
# ==============================================================================
data['male'] = data['male'].astype(str)
data = data[(data.age > 15)]
weight_males = data.loc[data.male == '1', 'weight']
weight_females = data.loc[data.male == '0', 'weight']
Graphical methods¶
Two of the most commonly used graphical methods for homoscedasticity analysis consist of representing the data through a boxplot or a violinplot. With both plots, the objective is to compare the dispersion of the groups.
# Violin plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3.5))
sns.violinplot(
x = 'weight',
y = 'male',
data = data,
hue = 'male',
inner = 'stick',
ax = ax
)
ax.set_title('Weight distribution by sex')
ax.set_xlabel('weight')
ax.set_ylabel('sex (1=male 0=female)');
# Boxplot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3.5))
sns.boxplot(
x = 'weight',
y = 'male',
data = data,
hue = 'male',
ax = ax
)
ax.set_title('Weight distribution by sex')
ax.set_xlabel('weight')
ax.set_ylabel('sex (1=male 0=female)');
Statistical tests¶
Levene's test, Bartlett's test, and the Fligner-Killeen test are three of the most commonly used hypothesis tests for comparing variance across groups. In all of them, the null hypothesis states that the data come from distributions with the same variance (homoscedasticity). Therefore, if the p-value is less than a certain threshold (typically 0.05), it is considered that there is sufficient evidence to reject homoscedasticity in favor of heteroscedasticity.
Levene's test and the Fligner-Killeen test (the latter being non-parametric) are more robust than Bartlett's test against deviations from normality, so their use is usually recommended when normality cannot be assumed. In the case of Levene's test, it is possible to choose the centrality statistic used to calculate deviations within each group (mean, median, or trimmed mean), which influences its robustness against outliers and asymmetric distributions.
Bartlett's test, on the other hand, is based directly on sample variances and assumes normality, making it very sensitive to deviations from this assumption.
If there is high confidence that the samples come from normally distributed populations, it is advisable to use Bartlett's test, as it is more powerful under normality. In the absence of this guarantee, Levene's test using the median or the non-parametric Fligner-Killeen test is recommended, which presents high robustness against non-normality and the presence of extreme values.
In applied practice, Levene's test with the median is often considered a balanced option between robustness and power.
All three tests are available in the scipy.stats library (scipy.stats.levene, scipy.stats.bartlett, scipy.stats.fligner) and in the pingouin library (pingouin.homoscedasticity).
# Levene test
# ==============================================================================
levene_test = stats.levene(weight_males, weight_females, center='median')
levene_test
LeveneResult(statistic=np.float64(0.18630521976263306), pvalue=np.float64(0.6662611053126026))
# Bartlett test
# ==============================================================================
bartlett_test = stats.bartlett(weight_males, weight_females)
bartlett_test
BartlettResult(statistic=np.float64(0.8473322751459793), pvalue=np.float64(0.3573081212488608))
# Fligner test
# ==============================================================================
fligner_test = stats.fligner(weight_males, weight_females, center='median')
fligner_test
FlignerResult(statistic=np.float64(0.1376531343594324), pvalue=np.float64(0.7106253515287645))
None of the tests show evidence to reject the hypothesis that the two groups have the same variance (homoscedasticity). p-value >> 0.05
Bootstrapping and permutation test¶
When samples are small or the assumptions of classical tests are not met, resampling methods such as bootstrapping or permutation tests can be used to assess homoscedasticity. These methods do not depend on specific assumptions about the distribution of the data and can provide more robust estimates of variability across groups.
# Bootstrapping variance difference
# ==============================================================================
def diff_var(x, y):
return np.var(x, ddof=1) - np.var(y, ddof=1)
res_boot = stats.bootstrap(
data = (weight_males, weight_females),
statistic = diff_var,
confidence_level = 0.95,
n_resamples = 10_000,
method = "bca",
random_state = 34895
)
print("95% confidence interval for variance difference:")
print(res_boot.confidence_interval)
95% confidence interval for variance difference: ConfidenceInterval(low=np.float64(-5.394767888622391), high=np.float64(16.212163831783855))
Since the 95% confidence interval for the variance difference includes the value 0, there is insufficient evidence to reject the null hypothesis of homoscedasticity between the two groups.
# Permutation test for variance difference
# ==============================================================================
def var_ratio(x, y):
return np.var(x, ddof=1) / np.var(y, ddof=1)
perm_test = stats.permutation_test(
data = (weight_males, weight_females),
statistic = var_ratio,
permutation_type = 'independent',
alternative = 'two-sided',
n_resamples = 10_000,
random_state = 34895
)
print(f"p-value for variance difference: {perm_test.pvalue}")
p-value for variance difference: 0.32636736326367366
The p-value obtained through the permutation test is consistent with the results of the classical tests, indicating that there is insufficient evidence to reject the null hypothesis of homoscedasticity between the groups.
Session information¶
import session_info
session_info.show(html=False)
----- matplotlib 3.10.8 numpy 2.2.6 pandas 2.3.3 scipy 1.15.3 seaborn 0.13.2 session_info v1.0.1 ----- IPython 9.8.0 jupyter_client 8.7.0 jupyter_core 5.9.1 ----- Python 3.13.11 | packaged by Anaconda, Inc. | (main, Dec 10 2025, 21:28:48) [GCC 14.3.0] Linux-6.14.0-37-generic-x86_64-with-glibc2.39 ----- Session information updated at 2026-01-20 22:31
Bibliography¶
OpenIntro Statistics: Fourth Edition by David Diez, Mine Çetinkaya-Rundel, Christopher Barr
Handbook of Biological Statistics by John H. McDonald
Levene, H. (1960). In Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, I. Olkin et al. eds., Stanford University Press, pp. 278-292.
Conover, W. J., Johnson, M. E. and Johnson M. M. (1981). A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics, 23(4), 351-361.
Fligner, M.A. and Killeen, T.J. (1976). Distribution-free two-sample tests for scale. 'Journal of the American Statistical Association.' 71(353), 210-213.
https://www.itl.nist.gov/div898/handbook/eda/section3/eda357.htm
Citation instructions¶
How to cite this document?
If you use this document or any part of it, we appreciate your citation. Thank you very much!
Tests equality of variances with Python by Joaquín Amat Rodrigo, available under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) license at https://www.cienciadedatos.net/documentos/pystats07-test-equality-of-variance-python.html
Did you like the article? Your support is important
Your contribution will help me continue generating free educational content. Thank you very much! 😊
This document created by Joaquín Amat Rodrigo is licensed under Attribution-NonCommercial-ShareAlike 4.0 International.
You are free to:
-
Share: copy and redistribute the material in any medium or format.
-
Adapt: remix, transform, and build upon the material.
Under the following terms:
-
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial: You may not use the material for commercial purposes.
-
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
