Forecasting probabilístico: intervalos de predicción para forecasting multi-step

Introducción

La mayoría de los modelos de forecasting tratan de predecir el valor más probable de una serie temporal, una técnica conocida como point-forecasting. Aunque es útil, no proporciona información sobre la incertidumbre asociada a las predicciones.

El forecasting probabilístico, a diferencia del point-forecasting, es una familia de técnicas que permiten predecir la distribución esperada en lugar de un único valor. Este enfoque proporciona más información al permitir estimar intervalos de predicción, es decir, rangos dentro de los cuales es probable que se encuentre el valor real. Más formalmente, un intervalo de predicción define el intervalo dentro del cual se espera encontrar el valor verdadero de la variable de respuesta con una probabilidad determinada.

La estimación de los intervalos de predicción en el forecasting es un reto, ya que muchos métodos bien establecidos para la regresión y el forecasting single-step (en el que se predice únicamente el siguiente valor de la serie) no son directamente aplicables al forecasting multi-step (en el que se predicen múltiples valores futuros). Además, hay que encontrar un equilibrio entre dos parámetros clave: la cobertura y la amplitud del intervalo. En la práctica, se intenta alcanzar un cierto nivel de cobertura (por ejemplo, el 80 %) manteniendo los intervalos de predicción lo más estrechos posible y el error de predicción del modelo lo más bajo posible.

Este documento se inspira en Carl McBride Ellis (2024). Predicción probabilística I: temperatura. Concurso Kaggle. En la competición, las predicciones se realizan con un paso de antelación (single-step), lo que significa que cada predicción se genera para el siguiente paso temporal basándose en valores anteriores conocidos. En esta adaptación, sin embargo, se utiliza un enfoque multipaso (multi-step), en el que se predicen varios pasos temporales futuros. Este escenario es más complejo pero a menudo es necesario para aplicaciones del mundo real.

La tarea consiste en estimar los intervalos de predicción de la variable objetivo, denominada OT, para los 28 días siguientes. Cada día se predicen las 24 horas siguientes, lo que significa que cada día se pronostican 24 valores. Este proceso se repite diariamente durante 28 días, lo que da como resultado un total de 672 predicciones (28 × 24) al final del periodo. Se estiman intervalos de predicción simétricos con niveles de cobertura del 10 %, 20 %, 30 %, 40 %, 50 %, 60 %, 70 %, 80 %, 90 % y 95 %, y se evalúa su cobertura.

Skforecast implementa varios métodos para el forecasting probabilístico:

💡 Tip

Este es el segundo de una serie de documentos sobre forecasting probabilístico.

Librerías

Las librerías utilizadas en este cuaderno son:

# Procesamiento de datos
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset

# Gráficos
# ==============================================================================
import matplotlib.pyplot as plt
from skforecast.plot import set_dark_theme, plot_prediction_intervals, plot_residuals
from statsmodels.graphics.tsaplots import plot_pacf
from statsmodels.tsa.stattools import pacf
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
pio.renderers.default = 'notebook' 
poff.init_notebook_mode(connected=True)

# Modelado y forecasting
# ==============================================================================
import skforecast
from sklearn.linear_model import Ridge
from sklearn.feature_selection import RFECV
from sklearn.pipeline import make_pipeline
from feature_engine.datetime import DatetimeFeatures
from feature_engine.creation import CyclicalFeatures
from feature_engine.timeseries.forecasting import LagFeatures, WindowFeatures
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold, bayesian_search_forecaster, backtesting_forecaster
from skforecast.feature_selection import select_features
from skforecast.preprocessing import RollingFeatures
from skforecast.metrics import calculate_coverage

# Warnings
# ==============================================================================
import warnings
warnings.filterwarnings('once')
print('Versión skforecast:', skforecast.__version__)

Versión skforecast: 0.16.0

Datos

Los datos de una estación transformadora de electricidad se recopilaron entre julio de 2016 y julio de 2018 (2 años × 365 días × 24 horas × 4 intervalos por hora = 70,080 valores). Para cada fecha se dispone de 7 variables: el valor predictivo «Temperatura del aceite (OT)», y 6 tipos variables dobre la carga de potencia externa: High UseFul Load (HUFL), High UseLess Load (HULL), Carga media sin uso (MUFL), Carga media sin uso (MULL), Carga baja sin uso (LUFL), Carga baja sin uso (LULL).

Zhou, Haoyi & Zhang, Shanghang & Peng, Jieqi & Zhang, Shuai & Li, Jianxin & Xiong, Hui & Zhang, Wancai. (2020). Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. 10.48550/arXiv.2012.07436. https://github.com/zhouhaoyi/ETDataset

# Descarga de datos
# ==============================================================================
data = fetch_dataset(name="ett_m2")
data = data.resample(rule="1h", closed="left", label="right").mean()
data

ett_m2
------
Data from an electricity transformer station was collected between July 2016 and
July 2018 (2 years x 365 days x 24 hours x 4 intervals per hour = 70,080 data
points). Each data point consists of 8 features, including the date of the
point, the predictive value "Oil Temperature (OT)", and 6 different types of
external power load features: High UseFul Load (HUFL), High UseLess Load (HULL),
Middle UseFul Load (MUFL), Middle UseLess Load (MULL), Low UseFul Load (LUFL),
Low UseLess Load (LULL).
Zhou, Haoyi & Zhang, Shanghang & Peng, Jieqi & Zhang, Shuai & Li, Jianxin &
Xiong, Hui & Zhang, Wancai. (2020). Informer: Beyond Efficient Transformer for
Long Sequence Time-Series Forecasting.
[10.48550/arXiv.2012.07436](https://arxiv.org/abs/2012.07436).
https://github.com/zhouhaoyi/ETDataset
Shape of the dataset: (69680, 7)

	HUFL	HULL	MUFL	MULL	LUFL	LULL	OT
date
2016-07-01 01:00:00	38.784501	10.88975	34.753500	8.55100	4.12575	1.26050	37.838250
2016-07-01 02:00:00	36.041249	9.44475	32.696001	7.13700	3.59025	0.62900	36.849250
2016-07-01 03:00:00	38.240000	11.41350	35.343501	9.10725	3.06000	0.31175	35.915750
2016-07-01 04:00:00	37.800250	11.45525	34.881000	9.28850	3.04400	0.60750	32.839375
2016-07-01 05:00:00	36.501750	10.49200	33.708250	8.65150	2.64400	0.00000	31.466125
...	...	...	...	...	...	...	...
2018-06-26 16:00:00	39.517250	11.45500	50.616000	12.08950	-10.64275	-1.28475	47.744249
2018-06-26 17:00:00	39.140499	11.32975	49.865250	11.84825	-10.33100	-1.35400	48.183498
2018-06-26 18:00:00	42.428501	12.85850	53.591250	13.33575	-10.95475	-1.41800	47.853999
2018-06-26 19:00:00	43.036000	12.87925	54.241250	13.33575	-10.91200	-1.41800	46.535750
2018-06-26 20:00:00	40.543500	11.16175	51.721500	11.76800	-11.22400	-1.41800	45.601625

17420 rows × 7 columns

# Gráfico de la serie
# ==============================================================================
set_dark_theme()
plt.rcParams['lines.linewidth'] = 0.5
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(data['OT'], label='OT');

# Gráfico de las series exógenas
# ==============================================================================
cols_to_plot = [
    'HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL', 'LULL'
]
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
colors = colors[1:] + ["violet", "magenta"]    
fig, axs = plt.subplots(6, 1, figsize=(8, 8), sharex=True)
for i, col in enumerate(cols_to_plot):
    axs[i].plot(data[col], label=col, color=colors[i])
    axs[i].legend(loc='lower left', fontsize=8)
    axs[i].tick_params(axis='both', labelsize=8)

# Autocorrelación parcial
# ==============================================================================
n_lags = 24 * 7
cols_to_plot = ['OT', 'HUFL', 'HULL', 'MUFL', 'MULL', 'LUFL', 'LULL']
pacf_df = []
fig, axs = plt.subplots(4, 2, figsize=(8, 6))
axs = axs.ravel()
for i, col in enumerate(cols_to_plot):
    pacf_values = pacf(data[col], nlags=n_lags)
    pacf_values = pd.DataFrame({
        'lag': range(1, len(pacf_values)),
        'value': pacf_values[1:],
        'variable': col
    })
    pacf_df.append(pacf_values)
    plot_pacf(data[col], lags=n_lags, ax=axs[i])
    axs[i].set_title(col, fontsize=10)
    axs[i].set_ylim(-0.5, 1.1)
plt.tight_layout()

# Top n lags con mayor autocorrelación parcial por variable
# ==============================================================================
n = 10
top_lags = set()
for pacf_values in pacf_df:
    variable = pacf_values['variable'].iloc[0]
    lags = pacf_values.nlargest(n, 'value')['lag'].tolist()
    top_lags.update(lags)
    print(f"{variable}: {lags}")
top_lags = list(top_lags)
print(f"\nAll lags: {top_lags}")

OT: [1, 15, 16, 14, 3, 6, 17, 13, 4, 5]
HUFL: [1, 11, 15, 19, 14, 17, 18, 12, 23, 9]
HULL: [1, 3, 2, 15, 12, 23, 11, 14, 24, 4]
MUFL: [1, 15, 11, 14, 17, 19, 12, 18, 9, 16]
MULL: [1, 3, 2, 24, 11, 15, 23, 12, 17, 4]
LUFL: [1, 2, 20, 18, 9, 17, 10, 16, 19, 21]
LULL: [1, 3, 14, 4, 18, 11, 9, 42, 19, 12]

All lags: [1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 23, 24, 42]

La serie objetivo y las series externas exhiben dinámicas similares con varios lags que pueden utilizarse como predictores.

Feature engineering

Se crean variables exógenas adicionales basadas en:

Valores rezagados (lags) de las series exógenas.
Estadísticas móviles de las series exógenas.
Características del calendario.

⚠ Warning

En este documento se asume que las series adicionales son conocidas en el futuro. Esto significa que sus valores son conocidos en el momento de la predicción, lo que permite utilizarlos como variables exógenas en el modelo de forecasting.

# Variables de calendario
# ==============================================================================
features_to_extract = ['year', 'month', 'week', 'day_of_week', 'hour']
calendar_transformer = DatetimeFeatures(
    variables           = 'index',
    features_to_extract = features_to_extract,
    drop_original       = False,
)

# Lags de las variables exógenas
# ==============================================================================
lag_transformer = LagFeatures(
    variables = ["HUFL", "MUFL", "MULL", "HULL", "LUFL", "LULL"],
    periods   = top_lags,
)

# Estadísticas móviles de las variables exógenas
# ==============================================================================
wf_transformer = WindowFeatures(
    variables   = ["HUFL", "MUFL", "MULL", "HULL", "LUFL", "LULL"],
    window      = ["1D", "7D"],
    functions   = ["mean", "max", "min"],
    freq        = "1h",
)

# Codificación de variables cíclicas
# ==============================================================================
features_to_encode = ['year', 'month', 'week', 'day_of_week', 'hour']
max_values = {
    "month": 12,
    "week": 52,
    "day_of_week": 7,
    "hour": 24,
}
cyclical_encoder = CyclicalFeatures(
                        variables     = features_to_encode,
                        max_values    = max_values,
                        drop_original = True
                   )

exog_transformer = make_pipeline(
                        calendar_transformer,
                        lag_transformer,
                        wf_transformer,
                        cyclical_encoder
                   )
display(exog_transformer)

data = exog_transformer.fit_transform(data)
# Eliminación de valores nulos introducidos al crear los lags
data = data.dropna()
exog_features = data.columns.difference(['OT']).tolist()
display(data.head(3))

Pipeline(steps=[('datetimefeatures',
                 DatetimeFeatures(drop_original=False,
                                  features_to_extract=['year', 'month', 'week',
                                                       'day_of_week', 'hour'],
                                  variables='index')),
                ('lagfeatures',
                 LagFeatures(periods=[1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14,
                                      15, 16, 17, 18, 19, 20, 21, 23, 24, 42],
                             variables=['HUFL', 'MUFL', 'MULL', 'HULL', 'LUFL',
                                        'LULL'])),
                ('windowfeatures',
                 WindowFeatures(freq='1h', functions=['mean', 'max', 'min'],
                                variables=['HUFL', 'MUFL', 'MULL', 'HULL',
                                           'LUFL', 'LULL'],
                                window=['1D', '7D'])),
                ('cyclicalfeatures',
                 CyclicalFeatures(drop_original=True,
                                  max_values={'day_of_week': 7, 'hour': 24,
                                              'month': 12, 'week': 52},
                                  variables=['year', 'month', 'week',
                                             'day_of_week', 'hour']))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

?Documentation for PipelineiNot fitted

Pipeline(steps=[('datetimefeatures',
                 DatetimeFeatures(drop_original=False,
                                  features_to_extract=['year', 'month', 'week',
                                                       'day_of_week', 'hour'],
                                  variables='index')),
                ('lagfeatures',
                 LagFeatures(periods=[1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14,
                                      15, 16, 17, 18, 19, 20, 21, 23, 24, 42],
                             variables=['HUFL', 'MUFL', 'MULL', 'HULL', 'LUFL',
                                        'LULL'])),
                ('windowfeatures',
                 WindowFeatures(freq='1h', functions=['mean', 'max', 'min'],
                                variables=['HUFL', 'MUFL', 'MULL', 'HULL',
                                           'LUFL', 'LULL'],
                                window=['1D', '7D'])),
                ('cyclicalfeatures',
                 CyclicalFeatures(drop_original=True,
                                  max_values={'day_of_week': 7, 'hour': 24,
                                              'month': 12, 'week': 52},
                                  variables=['year', 'month', 'week',
                                             'day_of_week', 'hour']))])

DatetimeFeatures

DatetimeFeatures(drop_original=False,
                 features_to_extract=['year', 'month', 'week', 'day_of_week',
                                      'hour'],
                 variables='index')

LagFeatures

LagFeatures(periods=[1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
                     19, 20, 21, 23, 24, 42],
            variables=['HUFL', 'MUFL', 'MULL', 'HULL', 'LUFL', 'LULL'])

WindowFeatures

WindowFeatures(freq='1h', functions=['mean', 'max', 'min'],
               variables=['HUFL', 'MUFL', 'MULL', 'HULL', 'LUFL', 'LULL'],
               window=['1D', '7D'])

CyclicalFeatures

CyclicalFeatures(drop_original=True,
                 max_values={'day_of_week': 7, 'hour': 24, 'month': 12,
                             'week': 52},
                 variables=['year', 'month', 'week', 'day_of_week', 'hour'])

	HUFL	HULL	MUFL	MULL	LUFL	LULL	OT	year	HUFL_lag_1	MUFL_lag_1	...	LULL_window_7D_max	LULL_window_7D_min	month_sin	month_cos	week_sin	week_cos	day_of_week_sin	day_of_week_cos	hour_sin	hour_cos
date
2016-07-02 19:00:00	33.6330	9.08900	29.874751	6.95600	3.753	0.61025	28.719500	2016	32.5440	29.017001	...	1.2605	0.0	-0.5	-0.866025	1.224647e-16	-1.0	-0.974928	-0.222521	-0.965926	0.258819
2016-07-02 20:00:00	31.7065	7.72750	27.884500	5.63600	3.753	0.67425	29.103875	2016	33.6330	29.874751	...	1.2605	0.0	-0.5	-0.866025	1.224647e-16	-1.0	-0.974928	-0.222521	-0.866025	0.500000
2016-07-02 21:00:00	31.8110	7.81125	27.362000	5.18025	4.435	1.42075	29.598500	2016	31.7065	27.884500	...	1.2605	0.0	-0.5	-0.866025	1.224647e-16	-1.0	-0.974928	-0.222521	-0.707107	0.707107

3 rows × 184 columns

Intervalos de predicción con residuos bootstrap

Partición de los datos

# Split train-validation-test
# ==============================================================================
end_train = '2017-10-01 23:59:00'
end_validation = '2018-04-03 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")

Dates train      : 2016-07-02 19:00:00 --- 2017-10-01 23:00:00  (n=10949)
Dates validacion : 2017-10-02 00:00:00 --- 2018-04-03 23:00:00  (n=4416)
Dates test       : 2018-04-04 00:00:00 --- 2018-06-26 20:00:00  (n=2013)

# Gráfico de las particiones
# ==============================================================================
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(data_train['OT'], label='Train')
ax.plot(data_val['OT'], label='Validation')
ax.plot(data_test['OT'], label='Test')
ax.set_title('OT')
ax.legend();

# Gráfico de las particiones tras aplicar diferenciación
# ==============================================================================
fig, ax = plt.subplots(figsize=(8, 3))
ax.plot(data_train['OT'].diff(1), label='Train')
ax.plot(data_val['OT'].diff(1), label='Validation')
ax.plot(data_test['OT'].diff(1), label='Test')
ax.set_title('Differenced OT')
ax.legend();

Forecaster

# Crear forecaster
# ==============================================================================
window_features = RollingFeatures(stats=['mean', 'min', 'max'], window_sizes=24)
forecaster = ForecasterRecursive(
                 regressor       = Ridge(random_state=15926),
                 lags            = top_lags,
                 differentiation = 1,
                 window_features = window_features,
                 binner_kwargs   = {'n_bins': 10}
             )

Selección de predictores

# Selección de predictores (autoregressivos y exógenas) con scikit-learn RFECV
# ==============================================================================
regressor = Ridge(random_state=15926)

selector = RFECV(
    estimator=regressor, step=1, cv=3, min_features_to_select=25, n_jobs=-1
)

selected_lags, selected_window_features, selected_exog = select_features(
    forecaster      = forecaster,
    selector        = selector,
    y               = data.loc[:end_train, 'OT'],
    exog            = data.loc[:end_train, exog_features],
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.5,
    random_state    = 123,
    verbose         = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 10906
Total number of records used for feature selection: 5453
Number of features available: 208
    Lags            (n=22)
    Window features (n=3)
    Exog            (n=183)
Number of features selected: 107
    Lags            (n=14) : [1, 2, 3, 4, 5, 6, 9, 12, 15, 17, 20, 23, 24, 42]
    Window features (n=2) : ['roll_min_24', 'roll_max_24']
    Exog            (n=91) : ['HUFL', 'HUFL_lag_1', 'HUFL_lag_12', 'HUFL_lag_13', 'HUFL_lag_15', 'HUFL_lag_19', 'HUFL_lag_2', 'HUFL_lag_20', 'HUFL_lag_23', 'HUFL_lag_4', 'HUFL_lag_5', 'HUFL_lag_9', 'HUFL_window_1D_mean', 'HULL', 'HULL_lag_1', 'HULL_lag_12', 'HULL_lag_14', 'HULL_lag_15', 'HULL_lag_2', 'HULL_lag_20', 'HULL_lag_21', 'HULL_lag_23', 'HULL_lag_3', 'HULL_lag_4', 'HULL_window_1D_mean', 'LUFL', 'LUFL_lag_1', 'LUFL_lag_10', 'LUFL_lag_15', 'LUFL_lag_19', 'LUFL_lag_2', 'LUFL_lag_20', 'LUFL_lag_23', 'LUFL_lag_3', 'LUFL_lag_4', 'LUFL_lag_5', 'LUFL_window_1D_mean', 'LULL', 'LULL_lag_1', 'LULL_lag_12', 'LULL_lag_13', 'LULL_lag_14', 'LULL_lag_18', 'LULL_lag_19', 'LULL_lag_20', 'LULL_lag_21', 'LULL_lag_23', 'LULL_lag_24', 'LULL_lag_4', 'LULL_lag_5', 'LULL_lag_6', 'LULL_window_1D_max', 'LULL_window_1D_min', 'MUFL', 'MUFL_lag_1', 'MUFL_lag_11', 'MUFL_lag_12', 'MUFL_lag_13', 'MUFL_lag_15', 'MUFL_lag_2', 'MUFL_lag_20', 'MUFL_lag_23', 'MUFL_lag_4', 'MUFL_lag_9', 'MUFL_window_1D_mean', 'MULL', 'MULL_lag_1', 'MULL_lag_11', 'MULL_lag_12', 'MULL_lag_13', 'MULL_lag_14', 'MULL_lag_15', 'MULL_lag_17', 'MULL_lag_19', 'MULL_lag_2', 'MULL_lag_20', 'MULL_lag_3', 'MULL_lag_4', 'MULL_lag_5', 'MULL_lag_9', 'MULL_window_1D_mean', 'MULL_window_1D_min', 'MULL_window_7D_mean', 'day_of_week_sin', 'hour_cos', 'hour_sin', 'month_cos', 'month_sin', 'week_cos', 'week_sin', 'year']

# Set los lags seleccionados
# ==============================================================================
forecaster.set_lags(selected_lags)

Búsqueda de hiperparámetros

# Busqueda de hiperparámetros
# ==============================================================================
cv = TimeSeriesFold(
        initial_train_size = len(data.loc[:end_train, :]),
        steps              = 24,  # todas las horas del día siguiente
        differentiation    = 1,
     )

def search_space(trial):
    search_space  = {
        'alpha': trial.suggest_float('alpha', 1e-7, 1e+3, log=True)
    }
    return search_space

results_search, frozen_trial = bayesian_search_forecaster(
                                   forecaster   = forecaster,
                                   y            = data.loc[:end_validation, 'OT'],
                                   exog         = data.loc[:end_validation, selected_exog],
                                   search_space = search_space,
                                   cv           = cv,
                                   metric       = 'mean_absolute_error',
                                   n_trials     = 20
                               )
best_params = results_search['params'].iloc[0]
best_lags = results_search['lags'].iloc[0]
results_search.head()

  0%|          | 0/20 [00:00<?, ?it/s]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [ 1  2  3  4  5  6  9 12 15 17 20 23 24 42] 
  Parameters: {'alpha': 1.1075004417904626e-07}
  Backtesting metric: 2.3811573340717134

	lags	params	mean_absolute_error	alpha
0	[1, 2, 3, 4, 5, 6, 9, 12, 15, 17, 20, 23, 24, 42]	{'alpha': 1.1075004417904626e-07}	2.381157	1.107500e-07
1	[1, 2, 3, 4, 5, 6, 9, 12, 15, 17, 20, 23, 24, 42]	{'alpha': 1.1602023537361842e-07}	2.381157	1.160202e-07
2	[1, 2, 3, 4, 5, 6, 9, 12, 15, 17, 20, 23, 24, 42]	{'alpha': 1.3232531712227085e-07}	2.381157	1.323253e-07
3	[1, 2, 3, 4, 5, 6, 9, 12, 15, 17, 20, 23, 24, 42]	{'alpha': 1.8131052405964636e-07}	2.381157	1.813105e-07
4	[1, 2, 3, 4, 5, 6, 9, 12, 15, 17, 20, 23, 24, 42]	{'alpha': 6.876338479023381e-07}	2.381157	6.876338e-07

Predicción de los intervalos

Tras seleccionar el modelo óptimo, incluidas sus predictores e hiperparámetros, se realiza un proceso de backtesting utilizando el conjunto de validación. Este proceso genera residuos fuera de la muestra (out-of-sample), que luego se utilizan para estimar los intervalos de predicción para el conjunto de test.

# Backtesting con datos de validación para obtener residuos out-sample
# ==============================================================================
cv = TimeSeriesFold(
        initial_train_size = len(data.loc[:end_train, :]),
        steps              = 24,  # all hours of next day
        differentiation    = 1,
     )

metric_val, predictions_val = backtesting_forecaster(
                                  forecaster = forecaster,
                                  y          = data.loc[:end_validation, 'OT'],
                                  exog       = data.loc[:end_validation, selected_exog],
                                  cv         = cv,
                                  metric     = 'mean_absolute_error'
                              )
display(metric_val)
fig, ax = plt.subplots(figsize=(8, 3))
data.loc[end_train:end_validation, 'OT'].plot(ax=ax, label='Real value')
predictions_val['pred'].plot(ax=ax, label='prediction')
ax.set_title("Backtesting on validation data")
ax.legend();

  0%|          | 0/184 [00:00<?, ?it/s]

	mean_absolute_error
0	2.381157

# Distribución de los residuos out-sample
# ==============================================================================
residuals = data.loc[predictions_val.index, 'OT'] - predictions_val['pred']
print(pd.Series(np.where(residuals < 0, 'negative', 'positive')).value_counts())
plt.rcParams.update({'font.size': 8})
_ = plot_residuals(residuals=residuals, figsize=(7, 4))

positive    2457
negative    1959
Name: count, dtype: int64

# Almacenar residuos out-sample en el forecaster
# ==============================================================================
forecaster.fit(y=data.loc[:end_train, 'OT'], exog=data.loc[:end_train, selected_exog])
forecaster.set_out_sample_residuals(
    y_true = data.loc[predictions_val.index, 'OT'], 
    y_pred = predictions_val['pred']
)

✎ Note

Cuando se almacenan nuevos residuos en el forecaster, los residuos se agrupan según la magnitud de las predicciones a las que corresponden. Más tarde, en el proceso de *bootstrapping*, los residuos se muestrean de la partición correspondiente, asegurando que la distribución de los residuos sea consistente con las predicciones. Este proceso permite obtener intervalos de predicción más precisos, ya que los residuos están más alineados con las predicciones a las que corresponden, lo que resulta en una mejor cobertura con intervalos más estrechos. Para obtener más información sobre cómo se utilizan los residuos en la estimación de intervalos, visite Probabilistic forecasting: prediction intervals and prediction distribution.

# Intervalos de los bins de residuos (condicionados a los valores predichos) y número de residuos almacenados
# ==============================================================================
for k, v in forecaster.binner_intervals_.items():
    print(f"{k}: {v}  n={len(forecaster.out_sample_residuals_by_bin_[k])}")

0: (-3.8716384676533124, -1.1238482614863585)  n=417
1: (-1.1238482614863585, -0.7140340975203685)  n=611
2: (-0.7140340975203685, -0.5147665419614711)  n=510
3: (-0.5147665419614711, -0.3512020918319507)  n=459
4: (-0.3512020918319507, -0.1987613354056137)  n=374
5: (-0.1987613354056137, -0.013725466918245388)  n=274
6: (-0.013725466918245388, 0.29437106461036644)  n=328
7: (0.29437106461036644, 0.7696661155396924)  n=426
8: (0.7696661155396924, 1.479336474696538)  n=516
9: (1.479336474696538, 5.6094825256755385)  n=500

# Distribución de los residuos out-sample por bin
# ==============================================================================
out_sample_residuals_by_bin_df = pd.DataFrame(
    dict(
        [(k, pd.Series(v))
         for k, v in forecaster.out_sample_residuals_by_bin_.items()]
    )
)

fig, ax = plt.subplots(figsize=(6, 3))
out_sample_residuals_by_bin_df.boxplot(ax=ax)
ax.set_title("Distribution of residuals by bin", fontsize=12)
ax.set_xlabel("Bin", fontsize=10)
ax.set_ylabel("Residuals", fontsize=10)
plt.show();

# Backtesting con intervalos de predicción en datos de test usando residuos out-sample
# ==============================================================================
cv = TimeSeriesFold(
        initial_train_size = len(data.loc[:end_validation, :]),
        steps              = 24,  # todas las horas del día siguiente
        differentiation    = 1
     )

metric, predictions = backtesting_forecaster(
                          forecaster              = forecaster,
                          y                       = data['OT'],
                          exog                    = data[selected_exog],
                          cv                      = cv,
                          metric                  = 'mean_absolute_error',
                          interval                = [10, 90],  # 80% intervalo de predicción
                          n_boot                  = 150,
                          interval_method         = 'bootstrapping',
                          use_in_sample_residuals = False,  # Usar residuos out-sample 
                          use_binned_residuals    = True,   # Intervalos condicionados a los valores predichos
                      )
predictions

  0%|          | 0/84 [00:00<?, ?it/s]

	pred	lower_bound	upper_bound
2018-04-04 00:00:00	32.748341	32.289988	33.384481
2018-04-04 01:00:00	31.947731	31.186928	33.132068
2018-04-04 02:00:00	31.052885	30.214739	33.000788
2018-04-04 03:00:00	30.108430	29.027782	32.823787
2018-04-04 04:00:00	29.229240	27.894726	32.262539
...	...	...	...
2018-06-26 16:00:00	49.968142	39.049217	56.065172
2018-06-26 17:00:00	49.287497	38.649894	56.572044
2018-06-26 18:00:00	48.319028	38.308019	56.107946
2018-06-26 19:00:00	47.041985	38.269496	55.357029
2018-06-26 20:00:00	45.751569	37.076566	54.628951

2013 rows × 3 columns

# Gráfico de los intervalos de predicción
# ==============================================================================
plt.rcParams['lines.linewidth'] = 1
fig, ax = plt.subplots(figsize=(9, 4))
plot_prediction_intervals(
    predictions     = predictions,
    y_true          = data_test,
    target_variable = "OT",
    initial_x_zoom  = None,
    title           = "Real value vs prediction in test data",
    xaxis_title     = "Date time",
    yaxis_title     = "OT",
    ax              = ax
)
fill_between_obj = ax.collections[0]
fill_between_obj.set_facecolor('white')
fill_between_obj.set_alpha(0.3)

# Covertura del intervalo predicho
# ==============================================================================
coverage = calculate_coverage(
    y_true      = data.loc[end_validation:, 'OT'],
    lower_bound = predictions["lower_bound"], 
    upper_bound = predictions["upper_bound"]
)
print(f"Predicted interval coverage: {round(100 * coverage, 2)} %")

# Area del intervalo
# ==============================================================================
area = (predictions["upper_bound"] - predictions["lower_bound"]).sum()
print(f"Area of the interval: {round(area, 2)}")

display(metric)

Predicted interval coverage: 81.57 %
Area of the interval: 20843.63

	mean_absolute_error
0	2.871771

# Gráfico interactivo
# ==============================================================================
fig = go.Figure([
        go.Scatter(name='Prediction', x=predictions.index, y=predictions['pred'], mode='lines'),
        go.Scatter(name='Real value', x=data_test.index, y=data_test['OT'], mode='lines'),
        go.Scatter(
            name='Upper Bound', x=predictions.index, y=predictions['upper_bound'],
            mode='lines', marker=dict(color="#444"), line=dict(width=0), showlegend=False
        ),
        go.Scatter(
            name='Lower Bound', x=predictions.index, y=predictions['lower_bound'],
            marker=dict(color="#444"), line=dict(width=0), mode='lines',
            fillcolor='rgba(68, 68, 68, 0.3)', fill='tonexty', showlegend=False
        )
    ])
fig.update_layout(
    title="Real value vs prediction in test data",
    xaxis_title= "Date time",
    yaxis_title= "OT",
    width=800,
    height=400,
    margin=dict(l=10, r=10, t=35, b=10),
    hovermode="x",
    legend=dict(orientation="h", yanchor="top", y=1.1, xanchor="left", x=0.001)
)
fig.show()

Estimación de múltiples intervalos

Se estiman intervalos de predicción para varios niveles de cobertura nominal: 10 %, 20 %, 30 %, 40 %, 50 %, 60 %, 70 %, 80 %, 90 % y 95 %, y se evalúa su cobertura real.

# Intervalos de predicción para diferentes coberturas nominales
# ==============================================================================
quantiles = [2.5, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 97.5]
intervals = [[2.5, 97.5], [5, 95], [10, 90], [15, 85], [20, 80], [30, 70], [35, 65], [40, 60], [45, 55]]
observed_coverages = []
observed_areas = []

metric, predictions = backtesting_forecaster(
                          forecaster              = forecaster,
                          y                       = data['OT'],
                          exog                    = data[selected_exog],
                          cv                      = cv,
                          metric                  = 'mean_absolute_error',
                          interval                = quantiles,
                          n_boot                  = 150,
                          interval_method         = 'bootstrapping',
                          use_in_sample_residuals = False,  # Usar residuos out-sample
                          use_binned_residuals    = True,   # Intervalos condicionados a los valores predichos
                      )
predictions.head()

  0%|          | 0/84 [00:00<?, ?it/s]

	pred	p_2.5	p_5	p_10	p_15	p_20	p_25	p_30	p_35	p_40	...	p_55	p_60	p_65	p_70	p_75	p_80	p_85	p_90	p_95	p_97.5
2018-04-04 00:00:00	32.748341	31.922109	32.053411	32.289988	32.337571	32.490408	32.551956	32.600155	32.743772	32.842704	...	33.016446	33.049557	33.107951	33.172141	33.212129	33.288902	33.323562	33.384481	33.482723	33.717116
2018-04-04 01:00:00	31.947731	30.421991	30.969322	31.186928	31.302311	31.665845	31.811999	31.950245	32.050338	32.124354	...	32.407196	32.524998	32.643774	32.748498	32.892010	32.960220	33.040611	33.132068	33.368347	33.726517
2018-04-04 02:00:00	31.052885	28.930792	29.592743	30.214739	30.354561	30.649688	30.910788	31.054912	31.251281	31.386422	...	31.860174	32.005682	32.082505	32.271244	32.453232	32.584171	32.770079	33.000788	33.416257	34.215743
2018-04-04 03:00:00	30.108430	27.136373	27.518602	29.027782	29.386043	29.762199	29.972688	30.335793	30.611191	30.772863	...	31.240968	31.489245	31.663091	31.794914	32.007040	32.219158	32.405931	32.823787	33.422536	35.872185
2018-04-04 04:00:00	29.229240	25.818478	26.956306	27.894726	28.732302	28.886822	29.192374	29.471908	29.727138	30.040276	...	30.739496	30.878859	31.072958	31.173531	31.382835	31.583442	31.869380	32.262539	32.897774	35.302005

5 rows × 22 columns

# Calcular cobertura y área observada para cada intervalo
# ==============================================================================
for interval in intervals:
    observed_coverage = calculate_coverage(
                            y_true      = data.loc[end_validation:, 'OT'],
                            lower_bound = predictions[f"p_{interval[0]}"], 
                            upper_bound = predictions[f"p_{interval[1]}"]
                        )
    observed_area = (predictions[f"p_{interval[1]}"] - predictions[f"p_{interval[0]}"]).sum()
    observed_coverages.append(100 * observed_coverage)
    observed_areas.append(observed_area)

results = pd.DataFrame({
              'Interval': intervals,
              'Nominal coverage': [interval[1] - interval[0] for interval in intervals],
              'Observed coverage': observed_coverages,
              'Area': observed_areas
          })
results.round(1)

	Interval	Nominal coverage	Observed coverage	Area
0	[2.5, 97.5]	95.0	93.9	33464.8
1	[5, 95]	90.0	90.1	27347.4
2	[10, 90]	80.0	81.6	20843.6
3	[15, 85]	70.0	73.9	16713.4
4	[20, 80]	60.0	64.6	13495.6
5	[30, 70]	40.0	44.6	8339.2
6	[35, 65]	30.0	33.6	6079.6
7	[40, 60]	20.0	21.8	3962.8
8	[45, 55]	10.0	11.3	1968.4

La cobertura empírica de todos los intervalos es muy cercana a la cobertura nominal esperada. Esto indica que los intervalos están bien calibrados y que el modelo es capaz de estimar la incertidumbre asociada a las predicciones.

Información de sesión

import session_info
session_info.show(html=False)

-----
feature_engine      1.8.3
matplotlib          3.10.1
numpy               2.2.5
optuna              3.6.2
pandas              2.2.3
plotly              6.0.1
session_info        v1.0.1
skforecast          0.16.0
sklearn             1.6.1
statsmodels         0.14.4
-----
IPython             9.1.0
jupyter_client      8.6.3
jupyter_core        5.7.2
notebook            6.5.7
-----
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27) [GCC 11.2.0]
Linux-6.11.0-25-generic-x86_64-with-glibc2.39
-----
Session information updated at 2025-05-16 21:01

Instrucciones para citar

¿Cómo citar este documento?

Si utilizas este documento o alguna parte de él, te agradecemos que lo cites. ¡Muchas gracias!

Forecasting probabilístico: intervalos de predicción para forecasting multi-step por Joaquín Amat Rodrigo y Javier Escobar Ortiz, disponible bajo una licencia Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) en https://www.cienciadedatos.net/documentos/py60-forecasting-probabilistico-intervalos-prediccion-forecasting-multi-step.html

¿Cómo citar skforecast?

Si utilizas skforecast, te agradeceríamos mucho que lo cites. ¡Muchas gracias!

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.16.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.16.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.16.0}, month = {05}, year = {2025}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }

¿Te ha gustado el artículo? Tu ayuda es importante

Tu contribución me ayudará a seguir generando contenido divulgativo gratuito. ¡Muchísimas gracias! 😊

Este documento creado por Joaquín Amat Rodrigo y Javier Escobar Ortiz tiene licencia Attribution-NonCommercial-ShareAlike 4.0 International.

Se permite:

Compartir: copiar y redistribuir el material en cualquier medio o formato.
Adaptar: remezclar, transformar y crear a partir del material.

Bajo los siguientes términos:

Atribución: Debes otorgar el crédito adecuado, proporcionar un enlace a la licencia e indicar si se realizaron cambios. Puedes hacerlo de cualquier manera razonable, pero no de una forma que sugiera que el licenciante te respalda o respalda tu uso.
No-Comercial: No puedes utilizar el material para fines comerciales.
Compartir-Igual: Si remezclas, transformas o creas a partir del material, debes distribuir tus contribuciones bajo la misma licencia que el original.