More about forecasting in cienciadedatos.net
- ARIMA and SARIMAX models with python
- Time series forecasting with machine learning
- Forecasting time series with gradient boosting: XGBoost, LightGBM and CatBoost
- Forecasting time series with XGBoost
- Probabilistic forecasting
- Forecasting with deep learning
- Forecasting energy demand with machine learning
- Forecasting web traffic with machine learning
- Intermittent demand forecasting
- Modelling time series trend with tree-based models
- Bitcoin price prediction with Python
- Stacking ensemble of machine learning models to improve forecasting
- Interpretable forecasting models
- Mitigating the Impact of Covid on forecasting Models
- Forecasting time series with missing values

Introduction
When faced with scenarios involving the prediction of hundreds or thousands of time series, a crucial decision arises: should one develop individual models for each series, or should one use a unified model to handle them all at once?
In Single-Series Modeling (Local Forecasting Model), a separate predictive model is created for each time series. While this method provides a comprehensive understanding of each series, its scalability can be challenged by the need to create and maintain hundreds or thousands of models.
Multi-Series Modeling (Global Forecasting Model) involves building a single predictive model that considers all time series simultaneously. It attempts to capture the core patterns that govern the series, thereby mitigating the potential noise that each series might introduce. This approach is computationally efficient, easy to maintain, and can yield more robust generalizations across time series, albeit potentially at the cost of sacrificing some individual insights.
This document shows how to forecast more than 1000 time series with a single model including exogenous features with some of them having different values per series.
๐ก Tip
This is the first in a series of documents on global forecasting models:Libraries
Libraries used in this document.
# Data management
# ==============================================================================
import numpy as np
import pandas as pd
# Plots
# ==============================================================================
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')
# Forecasting
# ==============================================================================
import skforecast
import lightgbm
from lightgbm import LGBMRegressor
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import make_column_transformer
from sklearn.feature_selection import RFECV
from skforecast.recursive import ForecasterRecursiveMultiSeries
from skforecast.model_selection import TimeSeriesFold, OneStepAheadFold
from skforecast.model_selection import backtesting_forecaster_multiseries
from skforecast.model_selection import bayesian_search_forecaster_multiseries
from skforecast.feature_selection import select_features_multiseries
from skforecast.preprocessing import RollingFeatures
from skforecast.preprocessing import series_long_to_dict
from skforecast.preprocessing import exog_long_to_dict
from feature_engine.datetime import DatetimeFeatures
from feature_engine.creation import CyclicalFeatures
from feature_engine.timeseries.forecasting import WindowFeatures
from skforecast.datasets import fetch_dataset
# Configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
print('Versiรณn skforecast:', skforecast.__version__)
print('Versiรณn lightgbm:', lightgbm.__version__)
Versiรณn skforecast: 0.15.1 Versiรณn lightgbm: 4.6.0
Data
Data used in this document has been obtained from the The Building Data Genome Project 2 https://github.com/buds-lab/building-data-genome-project-2. The dataset contains information on the energy consumption of more than 1500 buildings. The time range of the times-series data is the two full years (2016 and 2017) and the frequency is hourly measurements of electricity, heating and cooling water, steam, and irrigation meters. Additionally, the dataset includes information of the characteristics of the buildings and the weather conditions. Data has been aggregated to a daily resolution and only the electricity among the different energy sources has been considered.
# Load data
# ==============================================================================
data = fetch_dataset(name='bdg2_daily')
print("Data shape:", data.shape)
data.head(3)
bdg2_daily ---------- Daily energy consumption data from the The Building Data Genome Project 2 with building metadata and weather data. https://github.com/buds-lab/building-data- genome-project-2 Miller, C., Kathirgamanathan, A., Picchetti, B. et al. The Building Data Genome Project 2, energy meter data from the ASHRAE Great Energy Predictor III competition. Sci Data 7, 368 (2020). https://doi.org/10.1038/s41597-020-00712-x Shape of the dataset: (1153518, 17) Data shape: (1153518, 17)
building_id | meter_reading | site_id | primaryspaceusage | sub_primaryspaceusage | sqm | lat | lng | timezone | airTemperature | cloudCoverage | dewTemperature | precipDepth1HR | precipDepth6HR | seaLvlPressure | windDirection | windSpeed | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | |||||||||||||||||
2016-01-01 | Bear_assembly_Angel | 12808.1620 | Bear | Entertainment/public assembly | Entertainment/public assembly | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 6.1750 | 1.666667 | -5.229167 | 0.0 | 0.0 | 1020.891667 | 68.750000 | 3.070833 |
2016-01-02 | Bear_assembly_Angel | 9251.0003 | Bear | Entertainment/public assembly | Entertainment/public assembly | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 8.0875 | NaN | -1.404167 | 0.0 | 0.0 | 1017.687500 | 76.666667 | 3.300000 |
2016-01-03 | Bear_assembly_Angel | 14071.6500 | Bear | Entertainment/public assembly | Entertainment/public assembly | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 10.1125 | NaN | 1.708333 | -6.0 | -2.0 | 1011.491667 | 91.666667 | 3.120833 |
# Overall range of available dates
# ==============================================================================
print(
f"Rage of dates available : {data.index.min()} --- {data.index.max()} "
f"(n_days={(data.index.max() - data.index.min()).days})"
)
Rage of dates available : 2016-01-01 00:00:00 --- 2017-12-31 00:00:00 (n_days=730)
# Range of available dates per building
# ==============================================================================
available_dates_per_series = (
data
.dropna(subset="meter_reading")
.reset_index()
.groupby("building_id")
.agg(
min_index=("timestamp", "min"),
max_index=("timestamp", "max"),
n_values=("timestamp", "nunique")
)
)
display(available_dates_per_series)
print(f"Unique lenght of series: {available_dates_per_series.n_values.unique()}")
min_index | max_index | n_values | |
---|---|---|---|
building_id | |||
Bear_assembly_Angel | 2016-01-01 | 2017-12-31 | 731 |
Bear_assembly_Beatrice | 2016-01-01 | 2017-12-31 | 731 |
Bear_assembly_Danial | 2016-01-01 | 2017-12-31 | 731 |
Bear_assembly_Diana | 2016-01-01 | 2017-12-31 | 731 |
Bear_assembly_Genia | 2016-01-01 | 2017-12-31 | 731 |
... | ... | ... | ... |
Wolf_public_Norma | 2016-01-01 | 2017-12-31 | 731 |
Wolf_retail_Harriett | 2016-01-01 | 2017-12-31 | 731 |
Wolf_retail_Marcella | 2016-01-01 | 2017-12-31 | 731 |
Wolf_retail_Toshia | 2016-01-01 | 2017-12-31 | 731 |
Wolf_science_Alfreda | 2016-01-01 | 2017-12-31 | 731 |
1578 rows ร 3 columns
Unique lenght of series: [731]
All time series have the same length, starting from January 1, 2016 and ending on December 31, 2017. Exogenous variables have few missing values. Skforecast does not require that the time series have the same length, and missing values are allowed as long as the underlying regressor can handle them, which is the case for LightGBM, XGBoost, and HistGradientBoostingRegressor.
# Missing values per feature
# ==============================================================================
data.isna().mean().mul(100).round(2)
building_id 0.00 meter_reading 0.00 site_id 0.00 primaryspaceusage 1.20 sub_primaryspaceusage 1.20 sqm 0.00 lat 14.83 lng 14.83 timezone 0.00 airTemperature 0.02 cloudCoverage 7.02 dewTemperature 0.03 precipDepth1HR 0.02 precipDepth6HR 0.02 seaLvlPressure 9.56 windDirection 0.02 windSpeed 0.02 dtype: float64
Exogenous variables
Exogenous variables are variables that are external to the time series and can be used as features to improve the forecast. In this case, the exogenous variables used are: the characteristics of the buildings, calendar variables and weather conditions.
โ Warning
Exogenous variables must be known at the time of the forecast. For example, if temperature is used as an exogenous variable, the temperature value for the next day must be known at the time of the forecast. If the temperature value is not known, the forecast will not be possible.โ Warning
When creating new features in a multi-series dataset, it is important to avoid avoid mixing information of different series. It is recommended to use the groupby
and apply
methods to create these features.
Building characteristics
One of the key attributes associated with each building is its designated use. This feature may play a crucial role in influencing the energy consumption pattern, as distinct uses can significantly impact both the quantity and timing of energy consumption.
# Number of buildings
# ==============================================================================
print(f"Number of buildings: {data['building_id'].nunique()}")
print(f"Number of building types: {data['primaryspaceusage'].nunique()}")
print(f"Number of building subtypes: {data['sub_primaryspaceusage'].nunique()}")
Number of buildings: 1578 Number of building types: 16 Number of building subtypes: 104
For certain type and subtype categories, there are a limited number of buildings in the dataset. Types with fewer than 100 buildings and subtypes with fewer than 50 buildings are grouped into the "Other" category.
# Group unfrequent categories
# ==============================================================================
infrequent_types = (
data
.drop_duplicates(subset=['building_id'])['primaryspaceusage']
.value_counts()
.loc[lambda x: x < 100]
.index
.tolist()
)
infrequent_subtypes = (
data
.drop_duplicates(subset=['building_id'])['sub_primaryspaceusage']
.value_counts()
.loc[lambda x: x < 50]
.index
.tolist()
)
data['primaryspaceusage'] = np.where(
data['primaryspaceusage'].isin(infrequent_types),
'Other',
data['primaryspaceusage']
)
data['sub_primaryspaceusage'] = np.where(
data['sub_primaryspaceusage'].isin(infrequent_subtypes),
'Other',
data['sub_primaryspaceusage']
)
display(data.drop_duplicates(subset=['building_id'])['primaryspaceusage'].value_counts())
display(data.drop_duplicates(subset=['building_id', 'sub_primaryspaceusage'])['sub_primaryspaceusage'].value_counts())
primaryspaceusage Education 604 Office 296 Entertainment/public assembly 203 Public services 166 Lodging/residential 149 Other 141 Name: count, dtype: int64
sub_primaryspaceusage Other 612 Office 295 College Classroom 131 College Laboratory 116 K-12 School 109 Dormitory 91 Primary/Secondary Classroom 84 Education 67 Library 54 Name: count, dtype: int64
Calendar features
# Calendar features
# ==============================================================================
features_to_extract = [
'month',
'week',
'day_of_week',
]
calendar_transformer = DatetimeFeatures(
variables = 'index',
features_to_extract = features_to_extract,
drop_original = False,
)
data = calendar_transformer.fit_transform(data)
print(f"Data shape: {data.shape}")
data.head(3)
Data shape: (1153518, 20)
building_id | meter_reading | site_id | primaryspaceusage | sub_primaryspaceusage | sqm | lat | lng | timezone | airTemperature | cloudCoverage | dewTemperature | precipDepth1HR | precipDepth6HR | seaLvlPressure | windDirection | windSpeed | month | week | day_of_week | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | ||||||||||||||||||||
2016-01-01 | Bear_assembly_Angel | 12808.1620 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 6.1750 | 1.666667 | -5.229167 | 0.0 | 0.0 | 1020.891667 | 68.750000 | 3.070833 | 1 | 53 | 4 |
2016-01-02 | Bear_assembly_Angel | 9251.0003 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 8.0875 | NaN | -1.404167 | 0.0 | 0.0 | 1017.687500 | 76.666667 | 3.300000 | 1 | 53 | 5 |
2016-01-03 | Bear_assembly_Angel | 14071.6500 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 10.1125 | NaN | 1.708333 | -6.0 | -2.0 | 1011.491667 | 91.666667 | 3.120833 | 1 | 53 | 6 |
# Cyclical encoding of calendar features
# ==============================================================================
features_to_encode = [
"month",
"week",
"day_of_week",
]
max_values = {
"month": 12,
"week": 52,
"day_of_week": 6,
}
cyclical_encoder = CyclicalFeatures(
variables = features_to_encode,
max_values = max_values,
drop_original = False
)
data = cyclical_encoder.fit_transform(data)
print(f"Data shape: {data.shape}")
data.head(3)
Data shape: (1153518, 26)
building_id | meter_reading | site_id | primaryspaceusage | sub_primaryspaceusage | sqm | lat | lng | timezone | airTemperature | ... | windSpeed | month | week | day_of_week | month_sin | month_cos | week_sin | week_cos | day_of_week_sin | day_of_week_cos | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | |||||||||||||||||||||
2016-01-01 | Bear_assembly_Angel | 12808.1620 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 6.1750 | ... | 3.070833 | 1 | 53 | 4 | 0.5 | 0.866025 | 0.120537 | 0.992709 | -8.660254e-01 | -0.5 |
2016-01-02 | Bear_assembly_Angel | 9251.0003 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 8.0875 | ... | 3.300000 | 1 | 53 | 5 | 0.5 | 0.866025 | 0.120537 | 0.992709 | -8.660254e-01 | 0.5 |
2016-01-03 | Bear_assembly_Angel | 14071.6500 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 10.1125 | ... | 3.120833 | 1 | 53 | 6 | 0.5 | 0.866025 | 0.120537 | 0.992709 | -2.449294e-16 | 1.0 |
3 rows ร 26 columns
โ Note
For more information about calendar features and cyclical encoding visit Calendar features and Cyclical features in time series forecasting.Meteorological features
Meteorological variables are recorded at the site level, meaning that weather data varies by building location, even at the same timestamp. In other words, while the exogenous variables are consistent across all series, their values differ by location.
# Values of meteorological features for a guiven date in each site
# ==============================================================================
data.loc["2016-01-01"].groupby("site_id", observed=True).agg(
{
"airTemperature": "first",
"cloudCoverage": "first",
"dewTemperature": "first",
"precipDepth1HR": "first",
"precipDepth6HR": "first",
"seaLvlPressure": "first",
"windDirection": "first",
"windSpeed": "first",
}
)
airTemperature | cloudCoverage | dewTemperature | precipDepth1HR | precipDepth6HR | seaLvlPressure | windDirection | windSpeed | |
---|---|---|---|---|---|---|---|---|
site_id | ||||||||
Bear | 6.175000 | 1.666667 | -5.229167 | 0.0 | 0.0 | 1020.891667 | 68.750000 | 3.070833 |
Bobcat | -11.595833 | 0.000000 | -17.041667 | 0.0 | 0.0 | 1034.466667 | 260.869565 | 3.033333 |
Bull | 7.612500 | 4.000000 | 0.708333 | -2.0 | -1.0 | 1031.762500 | 130.416667 | 3.712500 |
Cockatoo | -2.000000 | 4.000000 | -4.066667 | 0.0 | 0.0 | NaN | 281.333333 | 4.826667 |
Crow | -1.787500 | NaN | -3.595833 | 15.0 | 13.0 | 1011.670833 | 236.250000 | 3.666667 |
Eagle | 3.187500 | 1.000000 | -4.487500 | 0.0 | 0.0 | 1017.445833 | 287.500000 | 3.470833 |
Fox | 10.200000 | 1.000000 | -2.804167 | 0.0 | 0.0 | 1018.512500 | 47.083333 | 0.470833 |
Gator | 23.291667 | 5.600000 | 19.625000 | -3.0 | 0.0 | 1018.663636 | 150.416667 | 2.520833 |
Hog | -5.583333 | 1.142857 | -9.741667 | -2.0 | -1.0 | 1019.545833 | 260.416667 | 4.758333 |
Lamb | 6.913043 | 0.000000 | 5.434783 | 0.0 | 0.0 | NaN | 123.181818 | 8.017391 |
Moose | -1.787500 | NaN | -3.595833 | 15.0 | 13.0 | 1011.670833 | 236.250000 | 3.666667 |
Mouse | 5.387500 | 0.000000 | 3.879167 | 0.0 | 0.0 | 1016.941667 | 116.666667 | 4.470833 |
Panther | 23.291667 | 5.600000 | 19.625000 | -3.0 | 0.0 | 1018.663636 | 150.416667 | 2.520833 |
Peacock | 5.558333 | 0.000000 | -1.541667 | 0.0 | 0.0 | 1019.783333 | 120.000000 | 1.275000 |
Rat | 5.633333 | 4.727273 | -2.112500 | 0.0 | 0.0 | 1020.450000 | 314.782609 | 3.816667 |
Robin | 5.387500 | 0.000000 | 3.879167 | 0.0 | 0.0 | 1016.941667 | 116.666667 | 4.470833 |
Shrew | 5.387500 | 0.000000 | 3.879167 | 0.0 | 0.0 | 1016.941667 | 116.666667 | 4.470833 |
Swan | 6.054167 | 0.400000 | -2.591667 | 0.0 | 0.0 | 1021.216667 | 154.000000 | 1.783333 |
Wolf | 5.716667 | 6.625000 | 3.208333 | 0.0 | 1.0 | 1007.625000 | 140.000000 | 8.875000 |
Skforecast allows you to include different exogenous variables and/or different values for each series in the dataset (more details in the next section).
Rolling features for exogenous variables
# Rolling features of meteorological features
# ==============================================================================
wf_transformer = WindowFeatures(
variables = ["airTemperature", "windSpeed"],
window = ["7D", "14D"],
functions = ["mean"],
freq = "D",
missing_values = "ignore",
drop_na = False,
)
data = data.groupby("building_id").apply(wf_transformer.fit_transform, include_groups=False).reset_index()
data = data.set_index("timestamp")
print(f"Data shape: {data.shape}")
data.head(3)
Data shape: (1153518, 30)
building_id | meter_reading | site_id | primaryspaceusage | sub_primaryspaceusage | sqm | lat | lng | timezone | airTemperature | ... | month_sin | month_cos | week_sin | week_cos | day_of_week_sin | day_of_week_cos | airTemperature_window_7D_mean | windSpeed_window_7D_mean | airTemperature_window_14D_mean | windSpeed_window_14D_mean | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
timestamp | |||||||||||||||||||||
2016-01-01 | Bear_assembly_Angel | 12808.1620 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 6.1750 | ... | 0.5 | 0.866025 | 0.120537 | 0.992709 | -8.660254e-01 | -0.5 | NaN | NaN | NaN | NaN |
2016-01-02 | Bear_assembly_Angel | 9251.0003 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 8.0875 | ... | 0.5 | 0.866025 | 0.120537 | 0.992709 | -8.660254e-01 | 0.5 | 6.17500 | 3.070833 | 6.17500 | 3.070833 |
2016-01-03 | Bear_assembly_Angel | 14071.6500 | Bear | Entertainment/public assembly | Other | 22117.0 | 37.871903 | -122.260729 | US/Pacific | 10.1125 | ... | 0.5 | 0.866025 | 0.120537 | 0.992709 | -2.449294e-16 | 1.0 | 7.13125 | 3.185417 | 7.13125 | 3.185417 |
3 rows ร 30 columns
Categorical features
LightGBM can handle categorical variables internally without any preprocessing. To allow automatic detection of categorical features based on pandas data types in a Forecaster
, categorical variables must first be encoded as integers (ordinal encoding) and then stored as a category
type. This is necessary because skforecast uses a numeric array internally to speed up the calculation, and LightGBM requires the categorical features to be encoded as category
to be detected automatically. It is also necessary to set the categorical_features
parameter to 'auto'
during the initialization of the forecaster using fit_kwargs = {'categorical_feature': 'auto'}
.
โ Warning
The four main gradient boosting frameworks โ LightGBM, scikit-learn's HistogramGradientBoosting, XGBoost, and CatBoost โ are capable of directly handling categorical features within the model. However, it is important to note that each framework has its own configurations, benefits and potential problems. To fully comprehend how to use these frameworks, it is highly recommended to refer to the skforecast user guide for a detailed understanding.# Transformer: ordinal encoding
# ==============================================================================
# A ColumnTransformer is used to transform categorical (not numerical) features
# using ordinal encoding. Numeric features are left untouched. Missing values
# are coded as -1. If a new category is found in the test set, it is encoded
# as -1.
categorical_features = ['primaryspaceusage', 'sub_primaryspaceusage', 'timezone']
transformer_exog = make_column_transformer(
(
OrdinalEncoder(
dtype=float,
handle_unknown="use_encoded_value",
unknown_value=np.nan,
encoded_missing_value=np.nan
),
categorical_features
),
remainder="passthrough",
verbose_feature_names_out=False,
).set_output(transform="pandas")
transformer_exog
ColumnTransformer(remainder='passthrough', transformers=[('ordinalencoder', OrdinalEncoder(dtype=<class 'float'>, handle_unknown='use_encoded_value', unknown_value=nan), ['primaryspaceusage', 'sub_primaryspaceusage', 'timezone'])], verbose_feature_names_out=False)In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
ColumnTransformer(remainder='passthrough', transformers=[('ordinalencoder', OrdinalEncoder(dtype=<class 'float'>, handle_unknown='use_encoded_value', unknown_value=nan), ['primaryspaceusage', 'sub_primaryspaceusage', 'timezone'])], verbose_feature_names_out=False)
['primaryspaceusage', 'sub_primaryspaceusage', 'timezone']
OrdinalEncoder(dtype=<class 'float'>, handle_unknown='use_encoded_value', unknown_value=nan)
passthrough
When creating a Forecaster with LGBMRegressor
, it is necessary to specify how to handle the categorical columns using the fit_kwargs
argument. This is because the categorical_feature
argument is only specified in the fit
method of LGBMRegressor
, and not during its initialization.
Modelling and forecasting
ForecasterRecursiveMultiSeries
allows modeling time series of different lengths and using different exogenous variables. When series have different lengths, the data must be transformed into a dictionary. The keys of the dictionary are the names of the series and the values are the series themselves. To do this, the series_long_to_dict
function is used, which takes the DataFrame
in "long format" and returns a dict
of pandas Series
. Similarly, when the exogenous variables are different (values or variables) for each series, the data must be transformed into a dictionary. The keys of the dictionary are the names of the series and the values are the exogenous variables themselves. The exog_long_to_dict
function is used, which takes the DataFrame
in "long format" and returns a dict
of exogenous variables (pandas Series
or pandas DataFrame
).
When all series have the same length and the same exogenous variables, it is not necessary to use dictionaries. Series can be passed as unique DataFrame
with each series in a column, and exogenous variables can be passed as a DataFrame
with the same length as the series.
โ Note
For more information about modelling series of different lengths and using different exogenous variables, visit Global Forecasting Models: Time series with different lengths and different exogenous variables.# Exogenous features selected for modeling
# ==============================================================================
exog_features = [
"primaryspaceusage",
"sub_primaryspaceusage",
"timezone",
"sqm",
"airTemperature",
"cloudCoverage",
"dewTemperature",
"precipDepth1HR",
"precipDepth6HR",
"seaLvlPressure",
"windDirection",
"windSpeed",
"day_of_week_sin",
"day_of_week_cos",
"week_sin",
"week_cos",
"month_sin",
"month_cos",
"airTemperature_window_7D_mean",
"windSpeed_window_7D_mean",
"airTemperature_window_14D_mean",
"windSpeed_window_14D_mean",
]
# Transform series and exog to dictionaries
# ==============================================================================
series_dict = series_long_to_dict(
data = data.reset_index(),
series_id = 'building_id',
index = 'timestamp',
values = 'meter_reading',
freq = 'D'
)
exog_dict = exog_long_to_dict(
data = data[exog_features + ['building_id']].reset_index(),
series_id = 'building_id',
index = 'timestamp',
freq = 'D'
)
To train the models, search for optimal hyperparameters, and evaluate their predictive performance, the data is divided into three separate sets: training, validation, and test.
# Partition data in train and test
# ==============================================================================
data = data.sort_index()
end_train = '2017-08-31 23:59:00'
end_validation = '2017-10-31 23:59:00'
series_dict_train = {k: v.loc[: end_train,] for k, v in series_dict.items()}
series_dict_valid = {k: v.loc[end_train: end_validation,] for k, v in series_dict.items()}
series_dict_test = {k: v.loc[end_validation:,] for k, v in series_dict.items()}
exog_dict_train = {k: v.loc[: end_train,] for k, v in exog_dict.items()}
exog_dict_valid = {k: v.loc[end_train: end_validation,] for k, v in exog_dict.items()}
exog_dict_test = {k: v.loc[end_validation:,] for k, v in exog_dict.items()}
print(
f"Rage of dates available : {data.index.min()} --- {data.index.max()} "
f"(n_days={(data.index.max() - data.index.min()).days})"
)
print(
f" Dates for training : {data.loc[: end_train, :].index.min()} --- {data.loc[: end_train, :].index.max()} "
f"(n_days={(data.loc[: end_train, :].index.max() - data.loc[: end_train, :].index.min()).days})"
)
print(
f" Dates for validation : {data.loc[end_train:end_validation, :].index.min()} --- {data.loc[end_train:end_validation, :].index.max()} "
f"(n_days={(data.loc[end_train:end_validation, :].index.max() - data.loc[end_train:end_validation, :].index.min()).days})"
)
print(
f" Dates for test : {data.loc[end_validation:, :].index.min()} --- {data.loc[end_validation:, :].index.max()} "
f"(n_days={(data.loc[end_validation:, :].index.max() - data.loc[end_validation:, :].index.min()).days})"
)
Rage of dates available : 2016-01-01 00:00:00 --- 2017-12-31 00:00:00 (n_days=730) Dates for training : 2016-01-01 00:00:00 --- 2017-08-31 00:00:00 (n_days=608) Dates for validation : 2017-09-01 00:00:00 --- 2017-10-31 00:00:00 (n_days=60) Dates for test : 2017-11-01 00:00:00 --- 2017-12-31 00:00:00 (n_days=60)
Hyperparameter tuning
Hyperparameter and lag tuning involves systematically testing different values or combinations of hyperparameters (and/or lags) to find the optimal configuration that gives the best performance. The skforecast library provides two different methods to evaluate each candidate configuration:
Backtesting: In this method, the model predicts several steps ahead in each iteration, using the same forecast horizon and retraining frequency strategy that would be used if the model were deployed. This simulates a real forecasting scenario where the model is retrained and updated over time.
One-Step Ahead: Evaluates the model using only one-step-ahead predictions. This method is faster because it requires fewer iterations, but it only tests the model's performance in the immediate next time step (t+1).
Each method uses a different evaluation strategy, so they may produce different results. However, in the long run, both methods are expected to converge to similar selections of optimal hyperparameters. The one-step-ahead method is much faster than backtesting because it requires fewer iterations, but it only tests the model's performance in the immediate next time step. It is recommended to backtest the final model for a more accurate multi-step performance estimate.
# Create forecaster
# ==============================================================================
window_features = RollingFeatures(stats=['mean', 'min', 'max'], window_sizes=7)
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(random_state=8520, verbose=-1),
lags = 14,
window_features = window_features,
transformer_exog = transformer_exog,
fit_kwargs = {'categorical_feature': categorical_features},
encoding = "ordinal"
)
# Bayesian search with OneStepAheadFold
# ==============================================================================
def search_space(trial):
search_space = {
'lags' : trial.suggest_categorical('lags', [31, 62]),
'n_estimators' : trial.suggest_int('n_estimators', 200, 800, step=100),
'max_depth' : trial.suggest_int('max_depth', 3, 8, step=1),
'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 25, 500),
'learning_rate' : trial.suggest_float('learning_rate', 0.01, 0.5),
'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 0.8, step=0.1),
'max_bin' : trial.suggest_int('max_bin', 50, 100, step=25),
'reg_alpha' : trial.suggest_float('reg_alpha', 0, 1, step=0.1),
'reg_lambda' : trial.suggest_float('reg_lambda', 0, 1, step=0.1)
}
return search_space
cv = OneStepAheadFold(initial_train_size=608) # size of the training set
results_search, best_trial = bayesian_search_forecaster_multiseries(
forecaster = forecaster,
series = {k: v.loc[:end_validation,] for k, v in series_dict.items()},
exog = {k: v.loc[:end_validation, exog_features] for k, v in exog_dict.items()},
cv = cv,
search_space = search_space,
n_trials = 10,
metric = "mean_absolute_error",
suppress_warnings = True
)
best_params = results_search.at[0, 'params']
best_lags = results_search.at[0, 'lags']
results_search.head(3)
0%| | 0/10 [00:00<?, ?it/s]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: Lags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62] Parameters: {'n_estimators': 600, 'max_depth': 6, 'min_data_in_leaf': 188, 'learning_rate': 0.1590191866233202, 'feature_fraction': 0.6, 'max_bin': 100, 'reg_alpha': 0.9, 'reg_lambda': 0.5} Backtesting metric: 261.5074045473127 Levels: ['Bear_assembly_Angel', 'Bear_assembly_Beatrice', 'Bear_assembly_Danial', 'Bear_assembly_Diana', 'Bear_assembly_Genia', 'Bear_assembly_Harry', 'Bear_assembly_Jose', 'Bear_assembly_Roxy', 'Bear_assembly_Ruby', 'Bear_education_Alfredo', '...', 'Wolf_office_Emanuel', 'Wolf_office_Haydee', 'Wolf_office_Joana', 'Wolf_office_Nadia', 'Wolf_office_Rochelle', 'Wolf_public_Norma', 'Wolf_retail_Harriett', 'Wolf_retail_Marcella', 'Wolf_retail_Toshia', 'Wolf_science_Alfreda']
levels | lags | params | mean_absolute_error__weighted_average | mean_absolute_error__average | mean_absolute_error__pooling | n_estimators | max_depth | min_data_in_leaf | learning_rate | feature_fraction | max_bin | reg_alpha | reg_lambda | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | [Bear_assembly_Angel, Bear_assembly_Beatrice, ... | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'n_estimators': 600, 'max_depth': 6, 'min_dat... | 261.507405 | 261.507405 | 261.507405 | 600.0 | 6.0 | 188.0 | 0.159019 | 0.6 | 100.0 | 0.9 | 0.5 |
1 | [Bear_assembly_Angel, Bear_assembly_Beatrice, ... | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'n_estimators': 400, 'max_depth': 5, 'min_dat... | 265.896416 | 265.896416 | 265.896416 | 400.0 | 5.0 | 437.0 | 0.132723 | 0.6 | 100.0 | 0.5 | 0.6 |
2 | [Bear_assembly_Angel, Bear_assembly_Beatrice, ... | [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14... | {'n_estimators': 500, 'max_depth': 5, 'min_dat... | 266.299734 | 266.299734 | 266.299734 | 500.0 | 5.0 | 227.0 | 0.163008 | 0.6 | 100.0 | 1.0 | 0.5 |
Backtesting on test data
# Backtesting on test set
# ==============================================================================
cv = TimeSeriesFold(
initial_train_size = 608 + 60, # training + validation
steps = 7,
refit = False
)
metrics, predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = series_dict,
exog = exog_dict,
cv = cv,
metric = 'mean_absolute_error',
suppress_warnings = True
)
display(predictions.head())
display(metrics)
0%| | 0/9 [00:00<?, ?it/s]
level | pred | |
---|---|---|
2017-10-30 | Bear_assembly_Angel | 9561.254764 |
2017-10-30 | Bear_assembly_Beatrice | 1112.779749 |
2017-10-30 | Bear_assembly_Danial | 4172.709133 |
2017-10-30 | Bear_assembly_Diana | 26.412851 |
2017-10-30 | Bear_assembly_Genia | 7780.739532 |
levels | mean_absolute_error | |
---|---|---|
0 | Bear_assembly_Angel | 1244.039948 |
1 | Bear_assembly_Beatrice | 244.728577 |
2 | Bear_assembly_Danial | 281.653933 |
3 | Bear_assembly_Diana | 39.561433 |
4 | Bear_assembly_Genia | 563.580736 |
... | ... | ... |
1576 | Wolf_retail_Toshia | 519.948486 |
1577 | Wolf_science_Alfreda | 220.802361 |
1578 | average | 346.329085 |
1579 | weighted_average | 346.329085 |
1580 | pooling | 346.329085 |
1581 rows ร 2 columns
# Aggregated metric for all buildings
# ==============================================================================
average_metric_all_buildings = metrics.query("levels == 'average'")["mean_absolute_error"].item()
errors_all_buildings = pd.merge(
left = data[['building_id', 'meter_reading']].reset_index(),
right = predictions.rename_axis("timestamp").reset_index(),
left_on = ['timestamp', 'building_id'],
right_on = ['timestamp', 'level'],
how = 'inner',
validate = "1:1"
).assign(error=lambda df: df['meter_reading'] - df['pred'])
sum_abs_errors_all_buildings = errors_all_buildings['error'].abs().sum().sum()
sum_bias_all_buildings = errors_all_buildings['error'].sum().sum()
print(f"Average mean absolute error for all buildings: {average_metric_all_buildings:.0f}")
print(f"Sum of absolute errors for all buildings (x 10,000): {sum_abs_errors_all_buildings/10000:.0f}")
print(f"Bias (x 10,000): {sum_bias_all_buildings/10000:.0f}")
Average mean absolute error for all buildings: 346 Sum of absolute errors for all buildings (x 10,000): 3443 Bias (x 10,000): -382
predictions
level | pred | |
---|---|---|
2017-10-30 | Bear_assembly_Angel | 9561.254764 |
2017-10-30 | Bear_assembly_Beatrice | 1112.779749 |
2017-10-30 | Bear_assembly_Danial | 4172.709133 |
2017-10-30 | Bear_assembly_Diana | 26.412851 |
2017-10-30 | Bear_assembly_Genia | 7780.739532 |
... | ... | ... |
2017-12-31 | Wolf_public_Norma | 1961.736704 |
2017-12-31 | Wolf_retail_Harriett | 634.324994 |
2017-12-31 | Wolf_retail_Marcella | -80.704692 |
2017-12-31 | Wolf_retail_Toshia | 821.663849 |
2017-12-31 | Wolf_science_Alfreda | 1528.388111 |
99414 rows ร 2 columns
# Plot predictions vs real value for 2 random buildings
# ==============================================================================
rng = np.random.default_rng(14793)
n_buildings = 2
selected_buildings = rng.choice(data['building_id'].unique(), size=n_buildings, replace=False)
fig, axs = plt.subplots(n_buildings, 1, figsize=(7, 4.5), sharex=True)
axs = axs.flatten()
for i, building in enumerate(selected_buildings):
data.query("building_id == @building").loc[predictions.index, 'meter_reading'].plot(ax=axs[i], label='test')
predictions.query("level == @building")['pred'].plot(ax=axs[i], label='predictions')
axs[i].set_title(f"Building {building}", fontsize=10)
axs[i].set_xlabel("")
axs[i].legend()
fig.tight_layout()
plt.show();
Feature selection
Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: to simplify models to make them easier to interpret, to reduce training time, to avoid the curse of dimensionality, to improve generalization by reducing overfitting (formally, variance reduction), and others.
Skforecast is compatible with the feature selection methods implemented in the scikit-learn library. There are several methods for feature selection, but the most common are:
Recursive feature elimination (RFE)
Sequential Feature Selection (SFS)
Feature selection based on threshold (SelectFromModel)
๐ก Tip
Feature selection is a powerful tool for improving the performance of machine learning models. However, it is computationally expensive and can be time-consuming. Since the goal is to find the best subset of features, not the best model, it is not necessary to use the entire data set or a highly complex model. Instead, it is recommended to use a small subset of the data and a simple model. Once the best subset of features has been identified, the model can then be trained using the entire dataset and a more complex configuration.# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(estimator=regressor, step=1, cv=3, n_jobs=1)
selected_lags, selected_window_features, selected_exog = select_features_multiseries(
forecaster = forecaster,
selector = selector,
series = {k: v.loc[:end_validation,] for k, v in series_dict.items()},
exog = {k: v.loc[:end_validation, exog_features] for k, v in exog_dict.items()},
select_only = None,
force_inclusion = None,
subsample = 0.2,
random_state = 123,
verbose = True,
)
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ MissingValuesWarning โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ โ NaNs detected in `X_train`. Some regressors do not allow NaN values during training. โ โ If you want to drop them, set `forecaster.dropna_from_series = True`. โ โ โ โ Category : MissingValuesWarning โ โ Location : โ โ /home/ubuntu/anaconda3/envs/skforecast_15_py12/lib/python3.12/site-packages/skforeca โ โ st/recursive/_forecaster_recursive_multiseries.py:1212 โ โ Suppress : warnings.simplefilter('ignore', category=MissingValuesWarning) โ โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Recursive feature elimination (RFECV) ------------------------------------- Total number of records available: 959424 Total number of records used for feature selection: 191884 Number of features available: 87 Lags (n=62) Window features (n=3) Exog (n=22) Number of features selected: 56 Lags (n=36) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 28, 29, 30, 33, 34, 35, 42, 44, 49, 50, 51, 54, 56, 57, 60, 62] Window features (n=3) : ['roll_mean_7', 'roll_min_7', 'roll_max_7'] Exog (n=17) : ['sub_primaryspaceusage', 'timezone', 'sqm', 'airTemperature', 'cloudCoverage', 'dewTemperature', 'seaLvlPressure', 'windDirection', 'windSpeed', 'day_of_week_sin', 'day_of_week_cos', 'week_sin', 'week_cos', 'airTemperature_window_7D_mean', 'windSpeed_window_7D_mean', 'airTemperature_window_14D_mean', 'windSpeed_window_14D_mean']
# Backtesting forecaster with selected features
# ==============================================================================
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(**best_params, random_state=8520, verbose=-1),
lags = selected_lags,
window_features = window_features,
transformer_exog = transformer_exog,
fit_kwargs = {'categorical_feature': categorical_features},
encoding = "ordinal"
)
cv = TimeSeriesFold(
initial_train_size = 608 + 60, # Entreanmiento + validaciรณn
steps = 7,
refit = False
)
metrics, predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = series_dict,
exog = {k: v[exog_features] for k, v in exog_dict.items()},
cv = cv,
metric = 'mean_absolute_error',
suppress_warnings = True
)
display(predictions.head())
display(metrics)
0%| | 0/9 [00:00<?, ?it/s]
level | pred | |
---|---|---|
2017-10-30 | Bear_assembly_Angel | 9297.191358 |
2017-10-30 | Bear_assembly_Beatrice | 996.638961 |
2017-10-30 | Bear_assembly_Danial | 4039.385147 |
2017-10-30 | Bear_assembly_Diana | 9.180618 |
2017-10-30 | Bear_assembly_Genia | 7354.941371 |
levels | mean_absolute_error | |
---|---|---|
0 | Bear_assembly_Angel | 1221.926524 |
1 | Bear_assembly_Beatrice | 218.906181 |
2 | Bear_assembly_Danial | 325.030420 |
3 | Bear_assembly_Diana | 37.261934 |
4 | Bear_assembly_Genia | 642.969646 |
... | ... | ... |
1576 | Wolf_retail_Toshia | 514.892785 |
1577 | Wolf_science_Alfreda | 228.344802 |
1578 | average | 343.951047 |
1579 | weighted_average | 343.951047 |
1580 | pooling | 343.951047 |
1581 rows ร 2 columns
The number of features included in the model has been reduced without compromising the model's performance. This simplifies the model and speeds up training.
Clustering time series
The idea behind modeling multiple series at the same time is to be able to capture the main patterns that govern the series, thereby reducing the impact of the potential noise that each series may have. This means that series that behave similarly may benefit from being modeled together. One way to identify potential groups of series is to perform a clustering study prior to modeling. If clear groups are identified as a result of clustering, it is appropriate to model each of them separately.
Clustering is an unsupervised analysis technique that groups a set of observations into clusters that contain observations that are considered homogeneous, while observations in different clusters are considered heterogeneous. Algorithms that cluster time series can be divided into two groups: those that use a transformation to create features prior to clustering (feature-driven time series clustering), and those that work directly on the time series (elastic distance measures).
Feature-driven time series clustering: Features describing structural characteristics are extracted from each time series, then these features are fed into arbitrary clustering algorithms. This features are obtained by applying statistical operations that best capture the underlying characteristics: trend, seasonality, periodicity, serial correlation, skewness, kurtosis, chaos, nonlinearity, and self-similarity.
Elastic distance measures: This approach works directly on the time series, adjusting or "realigning" the series in comparison to each other. The best known of this family of measures is Dynamic Time Warping (DTW).
For a detailed example of how time series clustering can improve the forecasting models, see Global Forecasting Models: Comparative Analysis of Single and Multi-Series Forecasting Modeling.
Session information
import session_info
session_info.show(html=False)
----- feature_engine 1.8.3 lightgbm 4.6.0 matplotlib 3.10.1 numpy 2.0.2 optuna 4.2.1 pandas 2.2.3 session_info 1.0.0 skforecast 0.15.1 sklearn 1.5.2 ----- IPython 9.0.2 jupyter_client 8.6.3 jupyter_core 5.7.2 notebook 6.4.12 ----- Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb 6 2025, 18:56:27) [GCC 11.2.0] Linux-5.15.0-1077-aws-x86_64-with-glibc2.31 ----- Session information updated at 2025-03-28 15:22
Citation
How to cite this document
If you use this document or any part of it, please acknowledge the source, thank you!
Forecasting at scale: modeling thousand time series with a single global model by Joaquรญn Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py59-scalable-forecasting-model.html
How to cite skforecast
If you use skforecast for a publication, we would appreciate it if you cite the published software.
Zenodo:
Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2025). skforecast (v0.15.1). Zenodo. https://doi.org/10.5281/zenodo.8382788
APA:
Amat Rodrigo, J., & Escobar Ortiz, J. (2025). skforecast (Version 0.15.1) [Computer software]. https://doi.org/10.5281/zenodo.8382788
BibTeX:
@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.15.1}, month = {03}, year = {2025}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }
Did you like the article? Your support is important
Your contribution will help me to continue generating free educational content. Many thanks! ๐
This work by Joaquรญn Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.
Allowed:
-
Share: copy and redistribute the material in any medium or format.
-
Adapt: remix, transform, and build upon the material.
Under the following terms:
-
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial: You may not use the material for commercial purposes.
-
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.