If you like Skforecast , help us giving a star on GitHub! ⭐️
More about forecasting
When trying to anticipate future values, most forecasting models aim to predict the most likely future value, a technique known as point forecasting. While useful, it does not provide information about the model's confidence or the uncertainty associated with the prediction.
Probabilistic forecasting, as opposed to point-forecasting, is a family of techniques that allow for predicting the expected distribution of the outcome rather than a single value. This approach offers richer insights by enabling the creation of prediction intervals —ranges within which the true value is likely to fall. More formally, a prediction interval defines the interval within which the true value of the response variable is expected to be found with a given probability.
Estimating prediction intervals in forecasting is challenging, since many well-established methods for regression and one-step-ahead forecasts are not directly applicable when predicting multiple steps ahead. Additionally, there is a trade-off between two key metrics: coverage and interval width. Ideally, we want to achieve a certain level of coverage (e.g. 80%) while keeping the prediction intervals as narrow as possible and the model's prediction error as low as possible.
This document is inspired by Carl McBride Ellis (2024). Probabilistic Forecasting I: Temperature. Kaggle competition. In the competition, predictions are made one-step-ahead, meaning each prediction is generated for the next time step based on known previous values. In this adaptation, however, a multi-step-ahead approach is used, where multiple future time steps are predicted simultaneously. This method is more challenging and requires a more sophisticated model but is often necessary for real-world applications.
The task is to estimate the prediction intervals for the target variable, labeled OT, over the next 28 days. Each day, the next 24 hours are predicted, meaning 24 values are forecasted each day. This process is repeated daily for 28 days, resulting in a total of 672 predictions (28 × 24) by the end of the period. Symmetric prediction intervals with coverage levels of 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 95% are estimated, and their coverage is evaluated.
💡 Tip
This is the second in a series of documents on probabilistic forecasting.The libraries used in this notebook are:
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
# Plots
# ==============================================================================
import matplotlib.pyplot as plt
from skforecast.plot import set_dark_theme
from skforecast.plot import plot_prediction_intervals
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.stattools import pacf
from skforecast.plot import plot_residuals
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
pio.renderers.default = 'notebook'
poff.init_notebook_mode(connected=True)
# Modelling and Forecasting
# ==============================================================================
import skforecast
import sklearn
from sklearn.linear_model import Ridge
from sklearn.feature_selection import RFECV
from sklearn.pipeline import make_pipeline
from feature_engine.datetime import DatetimeFeatures
from feature_engine.creation import CyclicalFeatures
from feature_engine.timeseries.forecasting import LagFeatures
from feature_engine.timeseries.forecasting import WindowFeatures
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import bayesian_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from skforecast.feature_selection import select_features
from skforecast.preprocessing import RollingFeatures
# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
print('Versión skforecast:', skforecast.__version__)
print('Versión sklearn:', sklearn.__version__)
Data from an electricity transformer station was collected between July 2016 and July 2018 (2 years × 365 days × 24 hours × 4 intervals per hour = 70,080 data points). Each data point consists of 8 features, including the date of the point, the predictive value "Oil Temperature (OT)", and 6 different types of external power load features: High UseFul Load (HUFL), High UseLess Load (HULL), Middle UseFul Load (MUFL), Middle UseLess Load (MULL), Low UseFul Load (LUFL), Low UseLess Load(LULL).
Zhou, Haoyi & Zhang, Shanghang & Peng, Jieqi & Zhang, Shuai & Li, Jianxin & Xiong, Hui & Zhang, Wancai. (2020). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. 10.48550/arXiv.2012.07436. https://github.com/zhouhaoyi/ETDataset
# Load data
# ==============================================================================
data = pd.read_csv("https://raw.githubusercontent.com/skforecast/skforecast-datasets/main/data/ETTm2.csv")
data['date'] = pd.to_datetime(data['date'])
data = data.set_index('date')
data = data.asfreq('15min')
data = data.resample(rule="1h", closed="left", label="right").mean()
data
# Plot target time series
# ==============================================================================
set_dark_theme()
plt.rcParams['lines.linewidth'] = 0.5
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(data['OT'], label='OT');
# Plot Exogenous features
# ==============================================================================
cols_to_plot = [
'HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL', 'LULL'
]
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
colors = colors[1:] + ["violet", "magenta"]
fig, axs = plt.subplots(6, 1, figsize=(8, 8), sharex=True)
for i, col in enumerate(cols_to_plot):
axs[i].plot(data[col], label=col, color=colors[i])
axs[i].legend(loc='lower left', fontsize=8)
axs[i].tick_params(axis='both', labelsize=8)
# Partial autocorrelation values and plots
# ==============================================================================
n_lags = 24 * 7
cols_to_plot = [
'OT', 'HUFL', 'HULL',
'MUFL', 'MULL',
'LUFL', 'LULL'
]
pacf_df = []
fig, axs = plt.subplots(4, 2, figsize=(8, 6))
axs = axs.ravel()
for i, col in enumerate(cols_to_plot):
pacf_values = pacf(data[col], nlags=n_lags)
pacf_values = pd.DataFrame({
'lag': range(1, len(pacf_values)),
'value': pacf_values[1:],
'variable': col
})
pacf_df.append(pacf_values)
plot_pacf(data[col], lags=n_lags, ax=axs[i])
axs[i].set_title(col, fontsize=10)
axs[i].set_ylim(-0.5, 1.1)
plt.tight_layout()
# Top n lags with highest partial autocorrelation per variable
# ==============================================================================
n = 10
top_lags = set()
for pacf_values in pacf_df:
variable = pacf_values['variable'].iloc[0]
lags = pacf_values.nlargest(n, 'value')['lag'].tolist()
top_lags.update(lags)
print(f"{variable}: {lags}")
top_lags = list(top_lags)
print(f"\nAll lags: {top_lags}")
The target series and the external series exhibit similar dynamics with several lagged correlations that can be used as predictors.
Additional exogenous features are created based on:
Lagged values of the exogenous series.
Rolling statistics of the exogenous series.
Calendar features.
⚠ Warning
Throughout this document, it is assumed that the aditional series are known into the future. This means its values are known at the time of prediction allowing to use them exogenous variables in the forecasting model.
# Calendar features
# ==============================================================================
features_to_extract = [
'year',
'month',
'week',
'day_of_week',
'hour'
]
calendar_transformer = DatetimeFeatures(
variables = 'index',
features_to_extract = features_to_extract,
drop_original = False,
)
# Lags of exogenous variables
# ==============================================================================
lag_transformer = LagFeatures(
variables = ["HUFL", "MUFL", "MULL", "HULL", "LUFL", "LULL"],
periods = top_lags,
)
# Rolling features for exogenous variables
# ==============================================================================
wf_transformer = WindowFeatures(
variables = ["HUFL", "MUFL", "MULL", "HULL", "LUFL", "LULL"],
window = ["1D", "7D"],
functions = ["mean", "max", "min"],
freq = "1h",
)
# Cliclical encoding of calendar features
# ==============================================================================
features_to_encode = [
"month",
"week",
"day_of_week",
"hour",
]
max_values = {
"month": 12,
"week": 52,
"day_of_week": 7,
"hour": 24,
}
cyclical_encoder = CyclicalFeatures(
variables = features_to_encode,
max_values = max_values,
drop_original = True
)
exog_transformer = make_pipeline(
calendar_transformer,
lag_transformer,
wf_transformer,
cyclical_encoder
)
display(exog_transformer)
data = exog_transformer.fit_transform(data)
# Remove rows with NaNs created by lag features
data = data.dropna()
exog_features = data.columns.difference(['OT']).tolist()
display(data.head(3))
# Split train-validation-test
# ==============================================================================
end_train = '2017-10-01 23:59:00'
end_validation = '2018-04-03 23:59:00'
data_train = data.loc[: end_train, :]
data_val = data.loc[end_train:end_validation, :]
data_test = data.loc[end_validation:, :]
print(f"Dates train : {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()} (n={len(data_val)})")
print(f"Dates test : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
# Plot partitions
# ==============================================================================
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(data_train['OT'], label='Train')
ax.plot(data_val['OT'], label='Validation')
ax.plot(data_test['OT'], label='Test')
ax.set_title('OT')
ax.legend();
# Plot partitions after differencing
# ==============================================================================
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(data_train['OT'].diff(1), label='Train')
ax.plot(data_val['OT'].diff(1), label='Validation')
ax.plot(data_test['OT'].diff(1), label='Test')
ax.set_title('Differenced OT')
ax.legend();
# Create forecaster
# ==============================================================================
window_features = RollingFeatures(stats=['mean', 'min', 'max'], window_sizes=24)
forecaster = ForecasterRecursive(
regressor = Ridge(random_state=15926),
lags = top_lags,
differentiation = 1,
window_features = window_features,
binner_kwargs = {'n_bins': 10}
)
forecaster
# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
regressor = Ridge(random_state=15926)
selector = RFECV(
estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)
selected_lags, selected_window_features, selected_exog = select_features(
forecaster = forecaster,
selector = selector,
y = data.loc[:end_train, 'OT'],
exog = data.loc[:end_train, exog_features],
select_only = None,
force_inclusion = None,
subsample = 0.5,
random_state = 123,
verbose = True,
)
# Set the selected lags
# ==============================================================================
forecaster.set_lags(selected_lags)
# Hyperparameters search
# ==============================================================================
cv = TimeSeriesFold(
initial_train_size = len(data.loc[:end_train, :]),
steps = 24, # all hours of next the day
differentiation = 1,
)
def search_space(trial):
search_space = {
'alpha': trial.suggest_float('alpha', 1e-7, 1e+3, log=True)
}
return search_space
results_search, frozen_trial = bayesian_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_validation, 'OT'],
exog = data.loc[:end_validation, selected_exog],
search_space = search_space,
cv = cv,
metric = 'mean_absolute_error',
n_trials = 20,
random_state = 123,
return_best = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
best_params = results_search['params'].iloc[0]
best_lags = results_search['lags'].iloc[0]
results_search.head()
After selecting the optimal model, including its features and hyperparameters, a backtesting process is performed using the validation set. This process generates out-of-sample residuals, which are then used to estimate prediction intervals for the test set.
# Backtesting on validation data to obtain out-sample residuals
# ==============================================================================
cv = TimeSeriesFold(
initial_train_size = len(data.loc[:end_train, :]),
steps = 24, # all hours of next day
differentiation = 1,
)
metric_val, predictions_val = backtesting_forecaster(
forecaster = forecaster,
y = data.loc[:end_validation, 'OT'],
exog = data.loc[:end_validation, selected_exog],
cv = cv,
metric = 'mean_absolute_error',
n_jobs = 'auto',
verbose = False,
show_progress = True
)
display(metric_val)
fig, ax = plt.subplots(figsize=(8, 3))
data.loc[end_train:end_validation, 'OT'].plot(ax=ax, label='Real value')
predictions_val['pred'].plot(ax=ax, label='prediction')
ax.set_title("Backtesting on validation data")
ax.legend();
# Out-sample residuals distribution
# ==============================================================================
residuals = data.loc[predictions_val.index, 'OT'] - predictions_val['pred']
print(pd.Series(np.where(residuals < 0, 'negative', 'positive')).value_counts())
plt.rcParams.update({'font.size': 8})
_ = plot_residuals(residuals=residuals, figsize=(7, 4))
# Store out-sample residuals in the forecaster
# ==============================================================================
forecaster.fit(y=data.loc[:end_train, 'OT'], exog=data.loc[:end_train, selected_exog])
forecaster.set_out_sample_residuals(
y_true = data.loc[predictions_val.index, 'OT'],
y_pred = predictions_val['pred']
)
✎ Note
When storing new residuals in the forecaster, the residuals are binned according to the magnitude of the predictions they correspond to. Later, in the bootstrapping process, the residuals are sampled from the corresponding bin, ensuring that the distribution of the residuals is consistent with the predictions. This process allows for more accurate prediction intervals, as the residuals are more closely aligned with the predictions they correspond to resulting in better coverage with narrower intervals. For more information about how residuals are used in interval estimation visit Probabilistic forecasting: prediction intervals and prediction distribution.# Intervals of the residual bins (conditioned on predicted values) and number of residuals stored
# ==============================================================================
for k, v in forecaster.binner_intervals_.items():
print(f"{k}: {v} n={len(forecaster.out_sample_residuals_by_bin_[k])}")
# Distribution of the residual by bin
# ==============================================================================
out_sample_residuals_by_bin_df = pd.DataFrame(
dict(
[(k, pd.Series(v))
for k, v in forecaster.out_sample_residuals_by_bin_.items()]
)
)
fig, ax = plt.subplots(figsize=(6, 3))
out_sample_residuals_by_bin_df.boxplot(ax=ax)
ax.set_title("Distribution of residuals by bin", fontsize=12)
ax.set_xlabel("Bin", fontsize=10)
ax.set_ylabel("Residuals", fontsize=10)
plt.show();
# Backtesting with prediction intervals in test data using out-sample residuals
# ==============================================================================
cv = TimeSeriesFold(
initial_train_size = len(data.loc[:end_validation, :]),
steps = 24, # all hours of next day
differentiation = 1
)
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['OT'],
exog = data[selected_exog],
cv = cv,
metric = 'mean_absolute_error',
interval = [10, 90], # 80% prediction interval
n_boot = 150,
use_in_sample_residuals = False, # Use out-sample residuals
use_binned_residuals = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
predictions
# Plot intervals
# ==============================================================================
plt.rcParams['lines.linewidth'] = 1
fig, ax = plt.subplots(figsize=(9, 4))
plot_prediction_intervals(
predictions = predictions,
y_true = data_test,
target_variable = "OT",
initial_x_zoom = None,
title = "Real value vs prediction in test data",
xaxis_title = "Date time",
yaxis_title = "OT",
ax = ax
)
fill_between_obj = ax.collections[0]
fill_between_obj.set_facecolor('white')
fill_between_obj.set_alpha(0.3)
# Predicted interval coverage (on test data)
# ==============================================================================
def empirical_coverage(y, lower_bound, upper_bound):
"""
Calculate coverage of a given interval
"""
return np.mean(np.logical_and(y >= lower_bound, y <= upper_bound))
coverage = empirical_coverage(
y = data.loc[end_validation:, 'OT'],
lower_bound = predictions["lower_bound"],
upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100 * coverage, 2)} %")
# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
display(metric)
# Interactive plot
# ==============================================================================
fig = go.Figure([
go.Scatter(name='Prediction', x=predictions.index, y=predictions['pred'], mode='lines'),
go.Scatter(name='Real value', x=data_test.index, y=data_test['OT'], mode='lines'),
go.Scatter(
name='Upper Bound', x=predictions.index, y=predictions['upper_bound'],
mode='lines', marker=dict(color="#444"), line=dict(width=0), showlegend=False
),
go.Scatter(
name='Lower Bound', x=predictions.index, y=predictions['lower_bound'],
marker=dict(color="#444"), line=dict(width=0), mode='lines',
fillcolor='rgba(68, 68, 68, 0.3)', fill='tonexty', showlegend=False
)
])
fig.update_layout(
title="Real value vs prediction in test data",
xaxis_title= "Date time",
yaxis_title= "OT",
width=800,
height=400,
margin=dict(l=10, r=10, t=35, b=10),
hovermode="x",
legend=dict(orientation="h", yanchor="top", y=1.1, xanchor="left", x=0.001)
)
fig.show()
Prediction intervals are estimated for various nominal coverage levels—10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, and 95%—and their actual coverage is assessed.
# Prediction intervals for different nominal coverages
# ==============================================================================
nominal_coverages = [[2.5, 97.5], [5, 95], [10, 90], [15, 85], [20, 80], [30, 70], [35, 65], [40, 60], [45, 55]]
observed_coverages = []
observed_areas = []
for coverage in nominal_coverages:
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['OT'],
exog = data[selected_exog],
cv = cv,
metric = 'mean_absolute_error',
interval = coverage, # Nominal coverage
n_boot = 150,
use_in_sample_residuals = False, # Use out-sample residuals
use_binned_residuals = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
observed_coverage = empirical_coverage(
y = data.loc[end_validation:, 'OT'],
lower_bound = predictions["lower_bound"],
upper_bound = predictions["upper_bound"]
)
observed_area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
observed_coverages.append(100 * observed_coverage)
observed_areas.append(observed_area)
results = pd.DataFrame({
'Interval': nominal_coverages,
'Nominal coverage': [coverage[1] - coverage[0] for coverage in nominal_coverages],
'Observed coverage': observed_coverages,
'Area': observed_areas
})
results.round(1)
The empirical coverage of all intervals is very close to the expected coverage (nominal coverage). This indicates that the intervals are well-calibrated and that the model is capable of estimating the uncertainty associated with the predictions.
import session_info
session_info.show(html=False)
How to cite this document
If you use this document or any part of it, please acknowledge the source, thank you!
Probabilistic forecasting: prediction intervals for multi-step time series forecasting by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) at https://cienciadedatos.net/documentos/py60-probabilistic-forecasting-prediction-intervals-multi-step-forecasting.html
How to cite skforecast
If you use skforecast for a publication, we would appreciate it if you cite the published software.
Zenodo:
Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.14.0). Zenodo. https://doi.org/10.5281/zenodo.8382788
APA:
Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.14.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788
BibTeX:
@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.14.0}, month = {11}, year = {2024}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }
Did you like the article? Your support is important
Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊
This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.
Allowed:
Share: copy and redistribute the material in any medium or format.
Adapt: remix, transform, and build upon the material.
Under the following terms:
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.