More about Data Science and Statistics

Introduction¶

Normality tests, also called normality contrasts, aim to analyze whether the available data could come from a population with a normal distribution. There are three main strategies to approach this analysis:

Graphical representations
Analytical methods
Hypothesis tests

One of the most commonly used examples when discussing random variables that follow a normal distribution is human height. This statement is not arbitrary; processes whose result is the sum of many small interactions tend to converge to a normal distribution. A person's height is the result of thousands of factors that add to each other, conditioning growth.

Throughout this document, we show how to use different strategies to determine whether the height of a group of people follows a normal distribution.

Libraries¶

The libraries used in this document are:

# Data processing
# ==============================================================================
import pandas as pd
import numpy as np

# Graphics
# ==============================================================================
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.rcParams.update({'font.size': 10})

# Preprocessing and analysis
# ==============================================================================
import statsmodels.api as sm
from scipy import stats

# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

Data¶

The data used in this example have been obtained from the book Statistical Rethinking by Richard McElreath. The dataset contains information collected by Nancy Howell in the late 1960s about the !Kung San people, who live in the Kalahari Desert between Botswana, Namibia, and Angola.

# Data
# ==============================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/' +
       'Estadistica-machine-learning-python/master/data/Howell1.csv')
data = pd.read_csv(url)
print(data.info())
data.head(4)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 544 entries, 0 to 543
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype  
---  ------  --------------  -----  
 0   height  544 non-null    float64
 1   weight  544 non-null    float64
 2   age     544 non-null    float64
 3   male    544 non-null    int64  
dtypes: float64(3), int64(1)
memory usage: 17.1 KB
None

	height	weight	age	male
0	151.765	47.825606	63.0	1
1	139.700	36.485807	63.0	0
2	136.525	31.864838	65.0	0
3	156.845	53.041914	41.0	1

From all available data, only women older than 15 years are selected.

data = data[(data.age > 15) & (data.male == 0)]
weight = data['weight']

Graphical methods¶

One of the most commonly used graphical methods for normality analysis consists of representing the data using a histogram and overlaying the curve of a normal distribution with the same mean and standard deviation as the available data.

# Histogram + theoretical normal curve
# ==============================================================================
# Mean (mu) and standard deviation (sigma) values of the data
mu, sigma = stats.norm.fit(weight)

# Theoretical values of the normal in the observed range
x_hat = np.linspace(min(weight), max(weight), num=100)
y_hat = stats.norm.pdf(x_hat, mu, sigma)

# Plot
fig, ax = plt.subplots(figsize=(6, 3))
ax.plot(x_hat, y_hat, linewidth=2, color='firebrick', label='normal')
ax.hist(x=weight, density=True, bins=30, color="#3182bd", alpha=0.5)
ax.plot(weight, np.full_like(weight, -0.01), '|k', markeredgewidth=1)
ax.set_title('Weight distribution of women older than 15 years')
ax.set_xlabel('weight')
ax.set_ylabel('Probability density')
ax.legend();

Another frequently used representation is the quantile-quantile plot (Q-Q plot). These plots compare the quantiles of the observed distribution with the theoretical quantiles of a normal distribution with the same mean and standard deviation as the data. The closer the data are to a normal distribution, the more aligned the points are around the line.

# Q-Q plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
sm.qqplot(
    data  = weight,
    fit   = True,
    line  = 'q',
    alpha = 0.4,
    ax    = ax
)
ref_line = ax.lines[1]
ref_line.set_color('black')
ref_line.set_linewidth(1.5) 
ax.set_title('Q-Q plot of weight of women older than 15 years');

Analytical methods: skewness and kurtosis¶

The statistics of skewness and kurtosis can be used to detect deviations from normality. The following are some rules of thumb:

Range of Skewness/Kurtosis	Interpretation	Action
-0.5 to +0.5	Symmetric	Distribution is approximately normal; safe to use parametric tests
-1 to -0.5 or +0.5 to +1	Moderately Skewed	Slight deviation; usually acceptable for most analyses
-2 to -1 or +1 to +2	Highly Skewed	Evident deviation; acceptable for some robust tests, but proceed with caution
<-2 or >+2	Extreme	Substantial non-normality; consider transforming data or using non-parametric tests

⚠️ Warning

These rules of thumb apply specifically to Excess Kurtosis, where a perfectly normal distribution has a value of 0. This is the default metric used in most major statistical software, including SPSS, Excel, and Python (SciPy). However, be aware that some software (such as Stata) reports "Raw Kurtosis" where a normal distribution has a baseline value of 3. If your output reports Raw Kurtosis, you must subtract 3 from the reported value. Skewness generally defaults to 0 for a normal distribution across almost all major software packages, so no conversion is usually necessary.

print('Kurtosis:', stats.kurtosis(weight))
print('Skewness:', stats.skew(weight))

Kurtosis: 0.05524614843093856
Skewness: 0.032122514283202334

Hypothesis tests¶

The Shapiro-Wilk test and D'Agostino's K-squared test are two of the most commonly used hypothesis tests to analyze normality. In both, the null hypothesis is that the data come from a normal distribution.

The p-value of these tests indicates the probability of obtaining data like those observed if they truly came from a population with a normal distribution with the same mean and deviation as these. Therefore, if the p-value is less than a certain value (typically 0.05), then it is considered that there is sufficient evidence to reject normality.

The Shapiro-Wilk test is not recommended when there is a lot of data (more than 50) due to its high sensitivity to small deviations from normality.

# Shapiro-Wilk test
# ==============================================================================
shapiro_test = stats.shapiro(weight)
shapiro_test

ShapiroResult(statistic=np.float64(0.9963739538348422), pvalue=np.float64(0.924083667304126))

# D'Agostino's K-squared test
# ==============================================================================
k2, p_value = stats.normaltest(weight)
print(f"Statistic = {k2}, p-value = {p_value}")

Statistic = 0.19896549779904893, p-value = 0.9053055672511008

Neither test shows evidence to reject the hypothesis that the data are normally distributed (p-value very close to 1).

When these tests are used to verify the conditions of parametric methods, for example a t-test or an ANOVA, it is important to keep in mind that, being p-values, the larger the sample size, the more statistical power they have and the easier it is to find evidence against the null hypothesis of normality. At the same time, the larger the sample size, the less sensitive parametric methods are to lack of normality. For this reason, it is important not to base conclusions solely on the p-value of the test, but also to consider the graphical representation and the sample size.

Consequences of lack of normality¶

The inability to assume normality primarily affects parametric hypothesis tests (t-test, ANOVA,...) and regression models. The main consequences of lack of normality are:

Least squares estimators are not efficient (of minimum variance).
Confidence intervals of model parameters and significance tests are only approximate and not exact.

The statistical tests presented require that the population from which the sample comes has a normal distribution, not the sample itself. If the sample is normally distributed, it can be accepted that the population of origin is as well. In the case that the sample is not normally distributed but there is certainty that the population of origin is, then it may be justified to accept the results obtained by the parametric tests as valid.

Session information¶

import session_info
session_info.show(html=False)

-----
matplotlib          3.10.8
numpy               2.2.6
pandas              2.3.3
scipy               1.15.3
session_info        v1.0.1
statsmodels         0.14.6
-----
IPython             9.8.0
jupyter_client      8.7.0
jupyter_core        5.9.1
-----
Python 3.13.11 | packaged by Anaconda, Inc. | (main, Dec 10 2025, 21:28:48) [GCC 14.3.0]
Linux-6.14.0-37-generic-x86_64-with-glibc2.39
-----
Session information updated at 2026-01-14 13:07

Bibliography¶

OpenIntro Statistics: Fourth Edition by David Diez, Mine Çetinkaya-Rundel, Christopher Barr

Shapiro, S. S., & Wilk, M. B. (1965). An analysis of variance test for normality (complete samples). Biometrika, 52(3/4), 591-611

D'Agostino, R. B. (1971), "An omnibus test of normality for moderate and large sample size", Biometrika, 58, 341-348

D'Agostino, R. and Pearson, E. S. (1973), "Tests for departure from normality", Biometrika, 60, 613-622

https://www.itl.nist.gov/div898/handbook/prc/section2/prc213.htm

Citation instructions¶

How to cite this document?

If you use this document or any part of it, we appreciate you citing it. Thank you very much!

Normality analysis with Python by Joaquín Amat Rodrigo, available under an Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) license at https://cienciadedatos.net/documentos/pystats06-normality-tests-python.html

Did you like the article? Your help is important

Your contribution will help me continue generating free educational content. Thank you very much! 😊

This document created by Joaquín Amat Rodrigo is licensed under Attribution-NonCommercial-ShareAlike 4.0 International.

Allowed:

Share: copy and redistribute the material in any medium or format.
Adapt: remix, transform, and build upon the material.

Under the following terms:

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Normality tests with Python

Joaquín Amat Rodrigo

April, 2021 (last updated January 2026)