More about Data Science and Statistics

Introduction¶

The assumption of homogeneity of variances states that the variability of the variable of interest is constant across different levels of a factor, that is, across different groups. This assumption is closely related to homoscedasticity, which in the context of linear regression models refers to the fact that the variance of the errors (residuals) of the model is constant across all predictions. When this condition is not met, we speak of heteroscedasticity.

There are various statistical tests that allow us to assess whether observations come from populations with equal variance. In all of them, the null hypothesis states that the variances are equal across groups, while the alternative hypothesis states that at least one of them differs. The tests differ mainly in the statistic and the measure of centrality used to calculate deviations:

Mean-based tests (such as Bartlett's test or the original version of Levene's test) have greater statistical power when the data follow normal distributions.
Median-based or non-parametric tests (such as the Brown-Forsythe version of Levene's test or the Fligner-Killeen test) are more robust against deviations from normality and asymmetric distributions.

In general, when it cannot be assumed with sufficient certainty that the populations follow a normal distribution, it is recommended to use robust tests that do not depend on this assumption.

Throughout this document, we show how to perform graphical analyses and how to apply the Levene, Bartlett, and Fligner-Killeen tests using Python.

Libraries¶

The libraries used in this document are:

# Data processing
# ==============================================================================
import pandas as pd
import numpy as np

# Graphics
# ==============================================================================
import matplotlib.pyplot as plt
import seaborn as sns
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth'] = 1.5
plt.rcParams['font.size'] = 8

# Preprocessing and analysis
# ==============================================================================
from scipy import stats

# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

Data¶

The data used in this example have been obtained from the book Statistical Rethinking by Richard McElreath. The dataset contains information collected by Nancy Howell in the late 1960s about the !Kung San people, who live in the Kalahari Desert between Botswana, Namibia, and Angola. The objective is to identify whether there is a difference between the variance of the weight of men and women.

# Data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/' +
       'Estadistica-machine-learning-python/master/data/Howell1.csv')
data = pd.read_csv(url)
print(data.info())
data.head(4)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 544 entries, 0 to 543
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   height  544 non-null    float64
 1   weight  544 non-null    float64
 2   age     544 non-null    float64
 3   male    544 non-null    int64  
dtypes: float64(3), int64(1)
memory usage: 17.1 KB
None

	height	weight	age	male
0	151.765	47.825606	63.0	1
1	139.700	36.485807	63.0	0
2	136.525	31.864838	65.0	0
3	156.845	53.041914	41.0	1

From all available data, only individuals over 15 years of age are selected.

# Data separation by group
# ==============================================================================
data['male'] = data['male'].astype(str)
data = data[(data.age > 15)]
weight_males = data.loc[data.male == '1', 'weight']
weight_females = data.loc[data.male == '0', 'weight']

Graphical methods¶

Two of the most commonly used graphical methods for homoscedasticity analysis consist of representing the data through a boxplot or a violinplot. With both plots, the objective is to compare the dispersion of the groups.

# Violin plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3.5))
sns.violinplot(
    x     = 'weight',
    y     = 'male',
    data  = data,
    hue   = 'male',
    inner = 'stick',
    ax    = ax
)
ax.set_title('Weight distribution by sex')
ax.set_xlabel('weight')
ax.set_ylabel('sex (1=male 0=female)');

# Boxplot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3.5))
sns.boxplot(
    x    = 'weight',
    y    = 'male',
    data = data,
    hue  = 'male',
    ax   = ax
)
ax.set_title('Weight distribution by sex')
ax.set_xlabel('weight')
ax.set_ylabel('sex (1=male 0=female)');

Statistical tests¶

Levene's test, Bartlett's test, and the Fligner-Killeen test are three of the most commonly used hypothesis tests for comparing variance across groups. In all of them, the null hypothesis states that the data come from distributions with the same variance (homoscedasticity). Therefore, if the p-value is less than a certain threshold (typically 0.05), it is considered that there is sufficient evidence to reject homoscedasticity in favor of heteroscedasticity.

Levene's test and the Fligner-Killeen test (the latter being non-parametric) are more robust than Bartlett's test against deviations from normality, so their use is usually recommended when normality cannot be assumed. In the case of Levene's test, it is possible to choose the centrality statistic used to calculate deviations within each group (mean, median, or trimmed mean), which influences its robustness against outliers and asymmetric distributions.

Bartlett's test, on the other hand, is based directly on sample variances and assumes normality, making it very sensitive to deviations from this assumption.

If there is high confidence that the samples come from normally distributed populations, it is advisable to use Bartlett's test, as it is more powerful under normality. In the absence of this guarantee, Levene's test using the median or the non-parametric Fligner-Killeen test is recommended, which presents high robustness against non-normality and the presence of extreme values.

In applied practice, Levene's test with the median is often considered a balanced option between robustness and power.

All three tests are available in the scipy.stats library (scipy.stats.levene, scipy.stats.bartlett, scipy.stats.fligner) and in the pingouin library (pingouin.homoscedasticity).

# Levene test
# ==============================================================================
levene_test = stats.levene(weight_males, weight_females, center='median')
levene_test

LeveneResult(statistic=np.float64(0.18630521976263306), pvalue=np.float64(0.6662611053126026))

# Bartlett test
# ==============================================================================
bartlett_test = stats.bartlett(weight_males, weight_females)
bartlett_test

BartlettResult(statistic=np.float64(0.8473322751459793), pvalue=np.float64(0.3573081212488608))

# Fligner test
# ==============================================================================
fligner_test = stats.fligner(weight_males, weight_females, center='median')
fligner_test

FlignerResult(statistic=np.float64(0.1376531343594324), pvalue=np.float64(0.7106253515287645))

None of the tests show evidence to reject the hypothesis that the two groups have the same variance (homoscedasticity). p-value >> 0.05

Bootstrapping and permutation test¶

When samples are small or the assumptions of classical tests are not met, resampling methods such as bootstrapping or permutation tests can be used to assess homoscedasticity. These methods do not depend on specific assumptions about the distribution of the data and can provide more robust estimates of variability across groups.

✏️ Note

To learn more about these methods, refer to the documents:

# Bootstrapping variance difference
# ==============================================================================
def diff_var(x, y):
    return np.var(x, ddof=1) - np.var(y, ddof=1)

res_boot = stats.bootstrap(
    data             = (weight_males, weight_females),
    statistic        = diff_var,
    confidence_level = 0.95,
    n_resamples      = 10_000,
    method           = "bca",
    random_state     = 34895
)

print("95% confidence interval for variance difference:")
print(res_boot.confidence_interval)

95% confidence interval for variance difference:
ConfidenceInterval(low=np.float64(-5.394767888622391), high=np.float64(16.212163831783855))

Since the 95% confidence interval for the variance difference includes the value 0, there is insufficient evidence to reject the null hypothesis of homoscedasticity between the two groups.

# Permutation test for variance difference
# ==============================================================================
def var_ratio(x, y):
    return np.var(x, ddof=1) / np.var(y, ddof=1)

perm_test = stats.permutation_test(
    data             = (weight_males, weight_females),
    statistic        = var_ratio,
    permutation_type = 'independent',
    alternative      = 'two-sided',
    n_resamples      = 10_000,
    random_state     = 34895
)
print(f"p-value for variance difference: {perm_test.pvalue}")

p-value for variance difference: 0.32636736326367366

The p-value obtained through the permutation test is consistent with the results of the classical tests, indicating that there is insufficient evidence to reject the null hypothesis of homoscedasticity between the groups.

Session information¶

import session_info
session_info.show(html=False)

-----
matplotlib          3.10.8
numpy               2.2.6
pandas              2.3.3
scipy               1.15.3
seaborn             0.13.2
session_info        v1.0.1
-----
IPython             9.8.0
jupyter_client      8.7.0
jupyter_core        5.9.1
-----
Python 3.13.11 | packaged by Anaconda, Inc. | (main, Dec 10 2025, 21:28:48) [GCC 14.3.0]
Linux-6.14.0-37-generic-x86_64-with-glibc2.39
-----
Session information updated at 2026-01-20 22:31

Bibliography¶

OpenIntro Statistics: Fourth Edition by David Diez, Mine Çetinkaya-Rundel, Christopher Barr

Handbook of Biological Statistics by John H. McDonald

Levene, H. (1960). In Contributions to Probability and Statistics: Essays in Honor of Harold Hotelling, I. Olkin et al. eds., Stanford University Press, pp. 278-292.

Conover, W. J., Johnson, M. E. and Johnson M. M. (1981). A comparative study of tests for homogeneity of variances, with applications to the outer continental shelf bidding data. Technometrics, 23(4), 351-361.

Fligner, M.A. and Killeen, T.J. (1976). Distribution-free two-sample tests for scale. 'Journal of the American Statistical Association.' 71(353), 210-213.

https://www.itl.nist.gov/div898/handbook/eda/section3/eda357.htm

Citation instructions¶

How to cite this document?

If you use this document or any part of it, we appreciate your citation. Thank you very much!

Tests equality of variances with Python by Joaquín Amat Rodrigo, available under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) license at https://www.cienciadedatos.net/documentos/pystats07-test-equality-of-variance-python.html

Did you like the article? Your support is important

Your contribution will help me continue generating free educational content. Thank you very much! 😊

This document created by Joaquín Amat Rodrigo is licensed under Attribution-NonCommercial-ShareAlike 4.0 International.

You are free to:

Share: copy and redistribute the material in any medium or format.
Adapt: remix, transform, and build upon the material.

Under the following terms:

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.