If you like ** Skforecast **, please give us a star on
__GitHub__! ⭐️

- Time series forecasting with machine learning
- ARIMA and SARIMAX models with python
- Forecasting time series with gradient boosting: XGBoost, LightGBM and CatBoost
- Forecasting energy demand with machine learning
- Global Forecasting Models: Multi-series forecasting
- Global Forecasting Models: Comparative Analysis of Single and Multi-Series Forecasting Modeling
- Modelling time series trend with tree based models
- Intermittent demand forecasting with skforecast
- Stacking ensemble of machine learning models to improve forecasting

When trying to anticipate future values, most forecasting models try to predict what will be the most likely value. This is called point-forecasting. Although knowing in advance the expected value of a time series is useful in almost every business case, this kind of prediction does not provide any information about the confidence of the model nor the prediction uncertainty.

Probabilistic forecasting, as opposed to point-forecasting, is a family of techniques that allow for predicting the expected distribution of the outcome instead of a single future value. This type of forecasting provides much rich information since it allows for creating prediction intervals, the range of likely values where the true value may fall. More formally, a prediction interval defines the interval within which the true value of the response variable is expected to be found with a given probability.

There are multiple ways to estimate prediction intervals, most of which require that the residuals (errors) of the model follow a normal distribution. When this property cannot be assumed, two alternatives commonly used are bootstrapping and quantile regression. In order to illustrate how skforecast allows estimating prediction intervals for multi-step forecasting, the following examples are shown:

Prediction intervals based on bootstrapped residuals and recursive-multi-step forecaster.

Prediction intervals based on quantile regression and direct-multi-step forecaster.

** ⚠ Warning**

- The random error term
- The parameter estimates
- The choice of model for the historical data
- The continuation of the historical data generating process into the future

**✎ Note**

**✎ Note**

The error of one-step-ahead forecast is defined as $e_t = y_t - \hat{y}_{t|t-1}$. Assuming future errors will be like past errors, it is possible to simulate different predictions by sampling from the collection of errors previously seen in the past (i.e., the residuals) and adding them to the predictions.

Doing this repeatedly, a collection of slightly different predictions is created (possible future paths), that represent the expected variance in the forecasting process.

Finally, prediction intervals can be computed by calculating the $α/2$ and $1 − α/2$ percentiles of the simulated data at each forecasting horizon.

The main advantage of this strategy is that it only requires a single model to estimate any interval. The drawback is that, running hundreds or thousands of bootstrapping iterations, is computationally very expensive and not always workable.

In [1]:

```
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset
# Plots
# ==============================================================================
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
pio.renderers.default = 'notebook'
poff.init_notebook_mode(connected=True)
plt.style.use('seaborn-v0_8-darkgrid')
from skforecast.plot import plot_residuals
from skforecast.plot import plot_prediction_distribution
from pprint import pprint
# Modelling and Forecasting
# ==============================================================================
import skforecast
import sklearn
import lightgbm
from lightgbm import LGBMRegressor
from sklearn.preprocessing import OneHotEncoder
from sklearn.compose import ColumnTransformer
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.ForecasterAutoregDirect import ForecasterAutoregDirect
from skforecast.model_selection import bayesian_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from sklearn.metrics import mean_pinball_loss
from scipy.stats import norm
# Configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
print(f"Version skforecast: {skforecast.__version__}")
print(f"Version sklearn: {sklearn.__version__}")
print(f"Version lightgbm: {lightgbm.__version__}")
```

In [2]:

```
# Data download
# ==============================================================================
data = fetch_dataset(name='bike_sharing_extended_features')
data.head(2)
```

Out[2]:

In [3]:

```
# One hot encoding of categorical variables
# ==============================================================================
encoder = ColumnTransformer(
[('one_hot_encoder', OneHotEncoder(sparse_output=False), ['weather'])],
remainder='passthrough',
verbose_feature_names_out=False
).set_output(transform="pandas")
data = encoder.fit_transform(data)
```

In [4]:

```
# Select exogenous variables to be included in the model
# ==============================================================================
exog_features = [
'weather_clear', 'weather_mist', 'weather_rain', 'month_sin', 'month_cos',
'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos',
'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'sunrise_hour_cos',
'sunset_hour_sin', 'sunset_hour_cos', 'temp', 'holiday'
]
data = data[['users'] + exog_features]
```

To facilitate the training of the models, the search for optimal hyperparameters and the evaluation of their predictive accuracy, the data are divided into three separate sets: training, validation and test.

In [5]:

```
# Split train-validation-test
# ==============================================================================
data = data.loc['2011-05-30 23:59:00':, :]
end_train = '2012-08-30 23:59:00'
end_validation = '2012-11-15 23:59:00'
data_train = data.loc[: end_train, :]
data_val = data.loc[end_train:end_validation, :]
data_test = data.loc[end_validation:, :]
print(f"Dates train : {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()} (n={len(data_val)})")
print(f"Dates test : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
```

Graphical exploration of time series can be an effective way of identifying trends, patterns, and seasonal variations. This, in turn, helps to guide the selection of the most appropriate forecasting model.

In [6]:

```
# Interactive plot of time series
# ==============================================================================
fig = go.Figure()
fig.add_trace(go.Scatter(x=data_train.index, y=data_train['users'], mode='lines', name='Train'))
fig.add_trace(go.Scatter(x=data_val.index, y=data_val['users'], mode='lines', name='Validation'))
fig.add_trace(go.Scatter(x=data_test.index, y=data_test['users'], mode='lines', name='Test'))
fig.update_layout(
title = 'Number of users',
xaxis_title="Time",
yaxis_title="Users",
legend_title="Partition:",
width=900,
height=400,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1,
xanchor="left",
x=0.001
)
)
fig.show()
```

Auto-correlation plots are a useful tool for identifying the order of an autoregressive model. The autocorrelation function (ACF) is a measure of the correlation between the time series and a lagged version of itself. The partial autocorrelation function (PACF) is a measure of the correlation between the time series and a lagged version of itself, controlling for the values of the time series at all shorter lags. These plots are useful for identifying the lags to be included in the autoregressive model.

In [7]:

```
# Autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 2))
plot_acf(data.users, ax=ax, lags=24*3)
plt.show()
```

In [8]:

```
# Partial autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 2))
plot_pacf(data.users, ax=ax, lags=24*3)
plt.show()
```

The autocorrelation plot demonstrates a strong correlation between the number of users in one hour and prior hours, as well as between users in one hour and the corresponding hour in preceding days. This observed correlation suggests that autoregressive models may be effective in this scenario.

A recursive-multi-step forecaster is trained and its hyperparameters optimized. Then, prediction intervals based on bootstrapped residuals are estimated.

In [9]:

```
# Create forecaster and hyperparameters search
# ==============================================================================
# Forecaster
forecaster = ForecasterAutoreg(
regressor = LGBMRegressor(random_state=15926, verbose=-1),
lags = 7 # Place holder that will be replaced during the search
)
# Lags used as predictors
lags_grid = [24, 48, (1, 2, 3, 23, 24, 25, 47, 48, 49, 71, 72, 73, 364*24, 365*24)]
# Regressor hyperparameters search space
def search_space(trial):
search_space = {
'lags' : trial.suggest_categorical('lags', lags_grid),
'n_estimators' : trial.suggest_int('n_estimators', 200, 800, step=100),
'max_depth' : trial.suggest_int('max_depth', 3, 8, step=1),
'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 25, 500),
'learning_rate' : trial.suggest_float('learning_rate', 0.01, 0.5),
'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 0.8, step=0.1),
'max_bin' : trial.suggest_int('max_bin', 50, 100, step=25),
'reg_alpha' : trial.suggest_float('reg_alpha', 0, 1, step=0.1),
'reg_lambda' : trial.suggest_float('reg_lambda', 0, 1, step=0.1)
}
return search_space
results_search, frozen_trial = bayesian_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_validation, 'users'],
exog = data.loc[:end_validation, exog_features],
steps = 24,
metric = 'mean_absolute_error',
search_space = search_space,
initial_train_size = len(data[:end_train]),
refit = False,
n_trials = 20, # Increase for more exhaustive search
random_state = 123,
return_best = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
best_params = results_search['params'].iloc[0]
```

Once the best hyperparameters have been selected, the `backtesting_forecaster()`

function is used to generate the prediction intervals for the entire test set.

The

`interval`

argument indicates the desired coverage probability of the prediction intervals. In this case,`interval`

is set to`[10, 90]`

, which means that the prediction intervals are calculated for the 10th and 90th percentiles, resulting in a theoretical coverage probability of 80%.The

`n_boot`

argument is used to specify the number of bootstrap samples to be used in estimating the prediction intervals. The larger the number of samples, the more accurate the prediction intervals will be, but the longer the calculation will take.

By default, intervals are calculated using in-sample residuals (residuals from the training set). However, this can result in intervals that are too narrow (overly optimistic).

In [10]:

```
# Backtesting with prediction intervals in test data using in-sample residuals
# ==============================================================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'], # Full dataset
exog = data[exog_features],
steps = 24,
metric = 'mean_absolute_error',
initial_train_size = len(data.loc[:end_validation]),
refit = False,
interval = [10, 90], # 80% prediction interval
n_boot = 250,
in_sample_residuals = True, # Use in-sample residuals
binned_residuals = False,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
predictions.head(5)
```

Out[10]:

In [11]:

```
# Function to plot predicted intervals
# ======================================================================================
def plot_predicted_intervals(
predictions: pd.DataFrame,
y_true: pd.DataFrame,
target_variable: str,
initial_x_zoom: list=None,
title: str=None,
xaxis_title: str=None,
yaxis_title: str=None,
):
"""
Plot predicted intervals vs real values
Parameters
----------
predictions : pandas DataFrame
Predicted values and intervals.
y_true : pandas DataFrame
Real values of target variable.
target_variable : str
Name of target variable.
initial_x_zoom : list, default `None`
Initial zoom of x-axis, by default None.
title : str, default `None`
Title of the plot, by default None.
xaxis_title : str, default `None`
Title of x-axis, by default None.
yaxis_title : str, default `None`
Title of y-axis, by default None.
"""
fig = go.Figure([
go.Scatter(name='Prediction', x=predictions.index, y=predictions['pred'], mode='lines'),
go.Scatter(name='Real value', x=y_true.index, y=y_true[target_variable], mode='lines'),
go.Scatter(
name='Upper Bound', x=predictions.index, y=predictions['upper_bound'],
mode='lines', marker=dict(color="#444"), line=dict(width=0), showlegend=False
),
go.Scatter(
name='Lower Bound', x=predictions.index, y=predictions['lower_bound'],
marker=dict(color="#444"), line=dict(width=0), mode='lines',
fillcolor='rgba(68, 68, 68, 0.3)', fill='tonexty', showlegend=False
)
])
fig.update_layout(
title=title,
xaxis_title=xaxis_title,
yaxis_title=yaxis_title,
width=900,
height=400,
margin=dict(l=20, r=20, t=35, b=20),
hovermode="x",
xaxis=dict(range=initial_x_zoom),
legend=dict(
orientation="h",
yanchor="top",
y=1.1,
xanchor="left",
x=0.001
)
)
fig.show()
def empirical_coverage(y, lower_bound, upper_bound):
"""
Calculate coverage of a given interval
"""
return np.mean(np.logical_and(y >= lower_bound, y <= upper_bound))
```

In [12]:

```
# Plot intervals
# ==============================================================================
plot_predicted_intervals(
predictions = predictions,
y_true = data_test,
target_variable = "users",
initial_x_zoom = ['2012-12-01', '2012-12-20'],
title = "Real value vs predicted in test data",
xaxis_title = "Date time",
yaxis_title = "users",
)
# Predicted interval coverage (on test data)
# ==============================================================================
coverage = empirical_coverage(
y = data.loc[end_validation:, 'users'],
lower_bound = predictions["lower_bound"],
upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100*coverage, 2)} %")
# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
```

The prediction intervals exhibit overconfidence as they tend to be excessively narrow, resulting in a true coverage that falls below the nominal coverage. This phenomenon arises from the tendency of in-sample residuals to often overestimate the predictive capacity of the model.

The `set_out_sample_residuals()`

method is used to specify out-sample residuals computed with a validation set through backtesting. Once the new residuals have been added to the forecaster, set `in_sample_residuals`

to `False`

when using the `predict_interval`

method.

In [13]:

```
# Backtesting on validation data to obtain out-sample residuals
# ==============================================================================
_, predictions_val = backtesting_forecaster(
forecaster = forecaster,
y = data.loc[:end_validation, 'users'],
exog = data.loc[:end_validation, exog_features],
steps = 24,
metric = 'mean_absolute_error',
initial_train_size = len(data.loc[:end_train]),
refit = False,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
residuals = data.loc[predictions_val.index, 'users'] - predictions_val['pred']
residuals = residuals.dropna()
```

In [14]:

```
# Out-sample residuals distribution
# ==============================================================================
print(pd.Series(np.where(residuals < 0, 'negative', 'positive')).value_counts())
plt.rcParams.update({'font.size': 8})
_ = plot_residuals(residuals=residuals, figsize=(7, 4))
```

In [15]:

```
# Store out-sample residuals in the forecaster
# ==============================================================================
forecaster.set_out_sample_residuals(residuals=residuals)
```

In [16]:

```
# Backtesting with prediction intervals in test data using out-sample residuals
# ==============================================================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data[exog_features],
steps = 24,
metric = 'mean_absolute_error',
initial_train_size = len(data.loc[:end_validation]),
refit = False,
interval = [10, 90], # 80% prediction interval
n_boot = 250,
in_sample_residuals = False, # Use out-sample residuals
binned_residuals = False,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
predictions.head(5)
```

Out[16]:

In [17]:

```
# Plot intervals
# ==============================================================================
plot_predicted_intervals(
predictions = predictions,
y_true = data_test,
target_variable = "users",
initial_x_zoom = ['2012-12-01', '2012-12-20'],
title = "Real value vs predicted in test data",
xaxis_title = "Date time",
yaxis_title = "users",
)
# Predicted interval coverage (on test data)
# ==============================================================================
coverage = empirical_coverage(
y = data.loc[end_validation:, 'users'],
lower_bound = predictions["lower_bound"],
upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100*coverage, 2)} %")
# Area of the interval
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")
```