If you like Skforecast , help us giving a star on GitHub! ⭐️
More about forecasting
In machine learning, stacking is an ensemble technique that combines multiple models to reduce their biases and improve predictive performance. More specifically, the predictions of each model (base models) are stacked and used as input to a final model (meta model) to compute the prediction.
Stacking is effective because it leverages the strengths of different algorithms and attempts to mitigate their individual weaknesses. By combining several models, it can capture complex patterns in the data and improve prediction accuracy.
However, stacking can be computationally expensive and requires careful tuning to avoid overfitting. To this end, it is highly recommended to train the final estimator through cross-validation. In addition, obtaining diverse and well-performing base models is critical to the success of the stacking technique.
The following example shows how to use scikit-learn and skforecast to create a forecasting model that combines several individual regressors to achieve better results.
Libraries used in this document.
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
# Plots
# ==============================================================================
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.io as pio
pio.templates.default = "seaborn"
plt.style.use('seaborn-v0_8-darkgrid')
# Modelling and Forecasting
# ==============================================================================
from lightgbm import LGBMRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import StackingRegressor
from sklearn.model_selection import KFold
from sklearn.preprocessing import StandardScaler
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from skforecast.datasets import fetch_dataset
# Configuration warnings
# ==============================================================================
import warnings
The data in this document represent monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01. The goal is to create a model capable of forecasting the consumption over the next 12 month.
# Downloading data
# ==============================================================================
data = fetch_dataset(name = 'fuel_consumption')
data = data.loc[:"2019-01-01", ['Gasolinas']]
data = data.rename(columns = {'Gasolinas':'consumption'})
data.index.name = 'date'
data['consumption'] = data['consumption']/100000
data.head(3)
In addition to the past values of the series (lags), an additional variable indicating the month of the year is added. This variable is included in the model to capture the seasonality of the series.
# Calendar features
# ==============================================================================
data['month_of_year'] = data.index.month
data.head(3)
To facilitate the training of the models, the search for optimal hyperparameters, and the evaluation of their predictive accuracy, the data are divided into three separate sets: training, validation, and test.
# Split train-validation-test
# ==============================================================================
end_train = '2007-12-01 23:59:00'
end_validation = '2012-12-01 23:59:00'
data_train = data.loc[: end_train, :]
data_val = data.loc[end_train:end_validation, :]
data_test = data.loc[end_validation:, :]
print(f"Dates train : {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()} (n={len(data_val)})")
print(f"Dates test : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
# Interactive plot of time series
# ==============================================================================
data.loc[:end_train, 'partition'] = 'train'
data.loc[end_train:end_validation, 'partition'] = 'validation'
data.loc[end_validation:, 'partition'] = 'test'
fig = px.line(
data_frame = data.reset_index(),
x = 'date',
y = 'consumption',
color = 'partition',
title = 'Fuel consumption',
width = 700,
height = 350,
)
fig.update_layout(
width = 700,
height = 350,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1,
xanchor="left",
x=0.001
)
)
fig.show()
data=data.drop(columns='partition')