Introducción

En el machine learning, el ensemble-stacking es una técnica que combina múltiples modelos para reducir sus sesgos y mejorar el rendimiento predictivo. Más específicamente, las predicciones de cada modelo (modelos base) se apilan y se utilizan como entrada para un modelo final (meta-modelo) para calcular la predicción.

El stacking es efectivo porque aprovecha las fortalezas de diferentes algoritmos e intenta mitigar sus debilidades individuales. Al combinar varios modelos, puede capturar patrones complejos en los datos y mejorar la precisión de las predicciones. Sin embargo, el stacking puede ser costoso desde el punto de vista computacional y requiere un ajuste cuidadoso para evitar el sobreajuste. Con este fin, se recomienda encarecidamente entrenar el estimador final mediante validación cruzada. Además, utilizar modelos base diversos y con buen rendimiento es clave para el éxito de la técnica de stacking.

Este documento muestra cómo utilizar scikit-learn y skforecast para crear un modelo de pronóstico que combine varios regresores individuales para lograr mejores resultados.

Librerías

Librerías utilizadas en este documento:

# Data processing
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)
plt.style.use('seaborn-v0_8-darkgrid')

# Modelling and Forecasting
# ==============================================================================
from lightgbm import LGBMRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble  import StackingRegressor
from sklearn.model_selection  import KFold
from sklearn.preprocessing  import StandardScaler
import skforecast
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from skforecast.datasets import fetch_dataset

# Configuration warnings
# ==============================================================================
import warnings
warnings.filterwarnings('once')

color = '\033[1m\033[38;5;208m' 
print(f"{color}Version skforecast: {skforecast.__version__}")

Version skforecast: 0.16.0

Datos

Los datos en este documento representan el consumo mensual de combustible en España desde el 01 de enero de 1969 hasta el 01 de agosto de 2022. El objetivo es crear un modelo capaz de pronosticar el consumo durante los próximos 12 meses.

# Descarga de datos
# ==============================================================================
data = fetch_dataset(name = 'fuel_consumption')
data = data.loc[:"2019-01-01", ['Gasolinas']]
data = data.rename(columns = {'Gasolinas':'consumption'})
data.index.name = 'date'
data['consumption'] = data['consumption']/100000
data.head(3)

fuel_consumption
----------------
Monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01.
Obtained from Corporación de Reservas Estratégicas de Productos Petrolíferos and
Corporación de Derecho Público tutelada por el Ministerio para la Transición
Ecológica y el Reto Demográfico. https://www.cores.es/es/estadisticas
Shape of the dataset: (644, 5)

	consumption
date
1969-01-01	1.668752
1969-02-01	1.554668
1969-03-01	1.849837

Además de los valores pasados de la serie (lags), se añade una variable adicional que indica el mes del año. Esta variable se incluye en el modelo para capturar la estacionalidad de la serie.

# Variables de calendario
# ==============================================================================
data['month_of_year'] = data.index.month
data.head(3)

	consumption	month_of_year
date
1969-01-01	1.668752	1
1969-02-01	1.554668	2
1969-03-01	1.849837	3

Para facilitar el entrenamiento de los modelos, la bsqueda de los hiperparámetros óptimos y la evaluación de su precisión predictiva, los datos se dividen en tres conjuntos separados: entrenamiento, validación y prueba.

# Partición train-validation-test
# ==============================================================================
end_train = '2007-12-01 23:59:00'
end_validation = '2012-12-01 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")

Dates train      : 1969-01-01 00:00:00 --- 2007-12-01 00:00:00  (n=468)
Dates validacion : 2008-01-01 00:00:00 --- 2012-12-01 00:00:00  (n=60)
Dates test       : 2013-01-01 00:00:00 --- 2019-01-01 00:00:00  (n=73)

# Gráfico de la serie
# ==============================================================================
data.loc[:end_train, 'partition'] = 'train'
data.loc[end_train:end_validation, 'partition'] = 'validation'
data.loc[end_validation:, 'partition'] = 'test'

fig = px.line(
    data_frame = data.reset_index(),
    x      = 'date',
    y      = 'consumption',
    color  = 'partition',
    title  = 'Fuel consumption',
    width  = 700,
    height = 350,
)
fig.update_layout(
    width  = 700,
    height = 350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1,
        xanchor="left",
        x=0.001
    )
)
fig.show()
data = data.drop(columns='partition')

Modelos individuales

Primero, se entrenan por separado dos modelos individuales: un modelo de regresión lineal y un modelo de gradient boosting, y se evalúa su rendimiento en el conjunto de prueba.

LightGBM

# Forecaster
# ==============================================================================
params_lgbm = {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 500, 'verbose': -1}
forecaster = ForecasterRecursive(
                 regressor = LGBMRegressor(random_state=123, **params_lgbm),
                 lags = 12
             )

# Backtesting con datos de test
# ==============================================================================
cv = TimeSeriesFold(
      steps              = 12,
      initial_train_size = len(data.loc[:end_validation]),
      fixed_train_size   = False,
    )
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['consumption'],
                            exog       = data['month_of_year'],
                            cv         = cv,
                            metric     = 'mean_squared_error',
                      )        
metric

  0%|          | 0/7 [00:00<?, ?it/s]

	mean_squared_error
0	0.066196

Modelo lineal

# Forecaster
# ==============================================================================
params_ridge = {'alpha': 0.001}
forecaster = ForecasterRecursive(
                 regressor     = Ridge(random_state=123, **params_ridge),
                 lags          = 12,
                 transformer_y = StandardScaler()
             )

# Backtesting con datos de test
# ==============================================================================
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['consumption'],
                            exog       = data['month_of_year'],
                            cv         = cv,
                            metric     = 'mean_squared_error'
                      )        
metric

  0%|          | 0/7 [00:00<?, ?it/s]

	mean_squared_error
0	0.049092

StackingRegressor

Con scikit-learn, es muy fácil combinar múltiples regresores gracias a la clase StackingRegressor. El parámetro estimators corresponde a la lista de modelos base que se apilan en paralelo sobre los datos de entrada. Debe proporcionarse como una lista de nombres y estimadores. El final_estimator (meta-modelo) utilizará las predicciones de los estimadores como entrada.

# Stacking regressor
# ==============================================================================
estimators = [
    ('ridge', Ridge(random_state=123, **params_ridge)),
    ('lgbm', LGBMRegressor(random_state=123, **params_lgbm)),
]
stacking_regressor = StackingRegressor(
                        estimators = estimators,
                        final_estimator = Ridge(),
                        cv = KFold(n_splits=10, shuffle=False)
                    )
stacking_regressor

StackingRegressor(cv=KFold(n_splits=10, random_state=None, shuffle=False),
                  estimators=[('ridge', Ridge(alpha=0.001, random_state=123)),
                              ('lgbm',
                               LGBMRegressor(max_depth=5, n_estimators=500,
                                             random_state=123, verbose=-1))],
                  final_estimator=Ridge())

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

# Forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
                 regressor = stacking_regressor,
                 lags = 12
             )

# Backtesting con datos de test
# ==============================================================================
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['consumption'],
                            exog       = data['month_of_year'],
                            cv         = cv,
                            metric     = 'mean_squared_error'
                      )        
metric

  0%|          | 0/7 [00:00<?, ?it/s]

	mean_squared_error
0	0.044688

Los resultados obtenidos al apilar los dos modelos: el modelo lineal y el modelo de gradient boosting, son mejores que los resultados obtenidos por cada modelo por separado.

Búsqueda de hiperparámetros

Al utilizar StackingRegressor, los hiperparámetros de los regresores individuales deben ir precedidos por el nombre del regresor seguido de dos guiones bajos. Por ejemplo, el hiperparámetro alpha del regresor Ridge debe especificarse como ridge__alpha. El hiperparámetro del estimador final debe especificarse con el prefijo final_estimator__.

# Grid search
# ==============================================================================
param_grid = {
    'ridge__alpha': [0.001, 0.01, 0.1, 1, 10],
    'lgbm__n_estimators': [100, 500],
    'lgbm__max_depth': [3, 5, 10],
    'lgbm__learning_rate': [0.01, 0.1],
}

lags_grid = [24]

cv_Search = TimeSeriesFold(
      steps              = 12,
      initial_train_size = len(data.loc[:end_train]),
      fixed_train_size   = False,
    )

results_grid = grid_search_forecaster(
                   forecaster  = forecaster,
                   y           = data.loc[:end_validation, 'consumption'],
                   exog        = data.loc[:end_validation, 'month_of_year'],
                   param_grid  = param_grid,
                   lags_grid   = lags_grid,
                   cv          = cv_Search,
                   metric      = 'mean_squared_error'
               )

results_grid.head()

lags grid:   0%|          | 0/1 [00:00<?, ?it/s]

params grid:   0%|          | 0/60 [00:00<?, ?it/s]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
  Parameters: {'lgbm__learning_rate': 0.1, 'lgbm__max_depth': 5, 'lgbm__n_estimators': 100, 'ridge__alpha': 0.001}
  Backtesting metric: 0.07934431616061044

	lags	lags_label	params	mean_squared_error	lgbm__learning_rate	lgbm__max_depth	lgbm__n_estimators	ridge__alpha
0	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'lgbm__learning_rate': 0.1, 'lgbm__max_depth'...	0.079344	0.10	5.0	100.0	0.001
1	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'lgbm__learning_rate': 0.1, 'lgbm__max_depth'...	0.079360	0.10	5.0	100.0	0.010
2	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'lgbm__learning_rate': 0.01, 'lgbm__max_depth...	0.079488	0.01	3.0	500.0	0.001
3	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'lgbm__learning_rate': 0.01, 'lgbm__max_depth...	0.079504	0.01	3.0	500.0	0.010
4	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'lgbm__learning_rate': 0.1, 'lgbm__max_depth'...	0.079512	0.10	5.0	100.0	0.100

Una vez que se han determinado los mejores hiperparámetros para cada regresor en el ensamblaje, se calcula el error de prueba mediante back-testing.

# Backtesting el mejor modelo con los datos de test
# ==============================================================================
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['consumption'],
                            exog       = data['month_of_year'],
                            cv         = cv,
                            metric     = 'mean_squared_error'
                      )        
metric

  0%|          | 0/7 [00:00<?, ?it/s]

	mean_squared_error
0	0.012739

Importancia de predictores

Cuando se utiliza un regresor de tipo StackingRegressor como regresor en un predictor, su método get_feature_importances no funcionará. Esto se debe a que los objetos de tipo StackingRegressor no tienen ni el atributo feature_importances ni el atributo coef_. En su lugar, es necesario inspeccionar cada uno de los regresores que forman parte del stacking.

# Importancia de predictores en cada regresor del stacking
# ==============================================================================
if forecaster.regressor.__class__.__name__ == 'StackingRegressor':
    importancia_pred = []
    for regressor in forecaster.regressor.estimators_:
        try:
            importancia = pd.DataFrame(
                data = {
                    'feature': forecaster.regressor.feature_names_in_,
                    f'importance_{type(regressor).__name__}': regressor.coef_,
                    f'importance_abs_{type(regressor).__name__}': np.abs(regressor.coef_)
                }
            ).set_index('feature')
        except:
            importancia = pd.DataFrame(
                data = {
                    'feature': forecaster.regressor.feature_names_in_,
                    f'importance_{type(regressor).__name__}': regressor.feature_importances_,
                    f'importance_abs_{type(regressor).__name__}': np.abs(regressor.feature_importances_)
                }
            ).set_index('feature')
        importancia_pred.append(importancia)
    
    importancia_pred = pd.concat(importancia_pred, axis=1)
    
else:
    importancia_pred = forecaster.get_feature_importances()
    importancia_pred['importance_abs'] = importancia_pred['importance'].abs()
    importancia_pred = importancia_pred.sort_values(by='importance_abs', ascending=False)

importancia_pred.head(5)

	importance_Ridge	importance_abs_Ridge	importance_LGBMRegressor	importance_abs_LGBMRegressor
feature
lag_1	0.016128	0.016128	58	58
lag_2	0.226245	0.226245	63	63
lag_3	0.185437	0.185437	44	44
lag_4	0.206009	0.206009	54	54
lag_5	0.105056	0.105056	38	38

Información de sesión

import session_info
session_info.show(html=False)

-----
lightgbm            4.6.0
matplotlib          3.10.1
numpy               1.26.4
pandas              2.2.3
plotly              6.0.1
session_info        v1.0.1
skforecast          0.16.0
sklearn             1.6.1
-----
IPython             9.1.0
jupyter_client      8.6.3
jupyter_core        5.7.2
notebook            6.5.7
-----
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27) [GCC 11.2.0]
Linux-6.11.0-25-generic-x86_64-with-glibc2.39
-----
Session information updated at 2025-05-13 22:57

Instrucciones para citar

¿Cómo citar este documento?

Si utilizas este documento o alguna parte de él, te agradecemos que lo cites. ¡Muchas gracias!

Stacking ensemble de modelos de forecasting por Joaquín Amat Rodrigo y Javier Escobar Ortiz, disponible bajo una licencia Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) en https://www.cienciadedatos.net/documentos/py52-stacking-ensemble-modelos-forecasting.html

¿Cómo citar skforecast?

Si utilizas skforecast, te agradeceríamos mucho que lo cites. ¡Muchas gracias!

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.16.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.16.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.16.0}, month = {05}, year = {2025}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }

¿Te ha gustado el artículo? Tu ayuda es importante

Tu contribución me ayudará a seguir generando contenido divulgativo gratuito. ¡Muchísimas gracias! 😊

Este documento creado por Joaquín Amat Rodrigo y Javier Escobar Ortiz tiene licencia Attribution-NonCommercial-ShareAlike 4.0 International.

Se permite:

Compartir: copiar y redistribuir el material en cualquier medio o formato.
Adaptar: remezclar, transformar y crear a partir del material.

Bajo los siguientes términos:

Atribución: Debes otorgar el crédito adecuado, proporcionar un enlace a la licencia e indicar si se realizaron cambios. Puedes hacerlo de cualquier manera razonable, pero no de una forma que sugiera que el licenciante te respalda o respalda tu uso.
No-Comercial: No puedes utilizar el material para fines comerciales.
Compartir-Igual: Si remezclas, transformas o creas a partir del material, debes distribuir tus contribuciones bajo la misma licencia que el original.

Stacking ensemble de modelos de forecasting

Joaquín Amat Rodrigo, Javier Escobar Ortiz

Noviembre, 2024 (última actualización Mayo 2025)