Introduction

When faced with scenarios involving the prediction of hundreds or thousands of time series, a crucial decision arises: should one develop individual models for each series, or should one use a unified model to handle them all at once?

In Single-Series Modeling (Local Forecasting Model), a separate predictive model is created for each time series. While this method provides a comprehensive understanding of each series, its scalability can be challenged by the need to create and maintain hundreds or thousands of models.

Multi-Series Modeling (Global Forecasting Model) involves building a single predictive model that considers all time series simultaneously. It attempts to capture the core patterns that govern the series, thereby mitigating the potential noise that each series might introduce. This approach is computationally efficient, easy to maintain, and can yield more robust generalizations across time series, albeit potentially at the cost of sacrificing some individual insights.

This document shows how to forecast more than 1000 time series with a single model including exogenous features with some of them having different values per series.

💡 Tip

This is the first in a series of documents on global forecasting models:

Libraries

Libraries used in this document.

# Data management
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')

# Forecasting
# ==============================================================================
import skforecast
import lightgbm
from lightgbm import LGBMRegressor
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import make_column_transformer
from sklearn.feature_selection import RFECV
from skforecast.recursive import ForecasterRecursiveMultiSeries
from skforecast.model_selection import TimeSeriesFold, OneStepAheadFold
from skforecast.model_selection import backtesting_forecaster_multiseries
from skforecast.model_selection import bayesian_search_forecaster_multiseries
from skforecast.feature_selection import select_features_multiseries
from skforecast.preprocessing import RollingFeatures
from skforecast.preprocessing import series_long_to_dict
from skforecast.preprocessing import exog_long_to_dict
from feature_engine.datetime import DatetimeFeatures
from feature_engine.creation import CyclicalFeatures
from feature_engine.timeseries.forecasting import WindowFeatures
from skforecast.datasets import fetch_dataset

# Configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

print('Versión skforecast:', skforecast.__version__)
print('Versión lightgbm:', lightgbm.__version__)

Versión skforecast: 0.16.0
Versión lightgbm: 4.6.0

Data

Data used in this document has been obtained from the The Building Data Genome Project 2 https://github.com/buds-lab/building-data-genome-project-2. The dataset contains information on the energy consumption of more than 1500 buildings. The time range of the times-series data is the two full years (2016 and 2017) and the frequency is hourly measurements of electricity, heating and cooling water, steam, and irrigation meters. Additionally, the dataset includes information of the characteristics of the buildings and the weather conditions. Data has been aggregated to a daily resolution and only the electricity among the different energy sources has been considered.

# Load data
# ==============================================================================
data = fetch_dataset(name='bdg2_daily')
print("Data shape:", data.shape)
data.head(3)

bdg2_daily
----------
Daily energy consumption data from the The Building Data Genome Project 2 with
building metadata and weather data. https://github.com/buds-lab/building-data-
genome-project-2
Miller, C., Kathirgamanathan, A., Picchetti, B. et al. The Building Data Genome
Project 2, energy meter data from the ASHRAE Great Energy Predictor III
competition. Sci Data 7, 368 (2020). https://doi.org/10.1038/s41597-020-00712-x
Shape of the dataset: (1153518, 17)
Data shape: (1153518, 17)

	building_id	meter_reading	site_id	primaryspaceusage	sub_primaryspaceusage	sqm	lat	lng	timezone	airTemperature	cloudCoverage	dewTemperature	precipDepth1HR	precipDepth6HR	seaLvlPressure	windDirection	windSpeed
timestamp
2016-01-01	Bear_assembly_Angel	12808.1620	Bear	Entertainment/public assembly	Entertainment/public assembly	22117.0	37.871903	-122.260729	US/Pacific	6.1750	1.666667	-5.229167	0.0	0.0	1020.891667	68.750000	3.070833
2016-01-02	Bear_assembly_Angel	9251.0003	Bear	Entertainment/public assembly	Entertainment/public assembly	22117.0	37.871903	-122.260729	US/Pacific	8.0875	NaN	-1.404167	0.0	0.0	1017.687500	76.666667	3.300000
2016-01-03	Bear_assembly_Angel	14071.6500	Bear	Entertainment/public assembly	Entertainment/public assembly	22117.0	37.871903	-122.260729	US/Pacific	10.1125	NaN	1.708333	-6.0	-2.0	1011.491667	91.666667	3.120833

# Overall range of available dates
# ==============================================================================
print(
    f"Rage of dates available : {data.index.min()} --- {data.index.max()}  "
    f"(n_days={(data.index.max() - data.index.min()).days})"
)

Rage of dates available : 2016-01-01 00:00:00 --- 2017-12-31 00:00:00  (n_days=730)

# Ensure index of all series is complete without intermediate gaps
# ==============================================================================
data = (
    data
    .groupby('building_id')
    .apply(lambda group: group.asfreq('D', fill_value=np.nan), include_groups=False)
    .reset_index(level=0, drop=False)
)
print("Data shape:", data.shape)

# Range of available dates per building
# ==============================================================================
available_dates_per_series = (
    data
    .dropna(subset="meter_reading")
    .reset_index()
    .groupby("building_id")
    .agg(
        min_index=("timestamp", "min"),
        max_index=("timestamp", "max"),
        n_values=("timestamp", "nunique")
    )
)
display(available_dates_per_series)
print(f"Unique lenght of series: {available_dates_per_series.n_values.unique()}")

	min_index	max_index	n_values
building_id
Bear_assembly_Angel	2016-01-01	2017-12-31	731
Bear_assembly_Beatrice	2016-01-01	2017-12-31	731
Bear_assembly_Danial	2016-01-01	2017-12-31	731
Bear_assembly_Diana	2016-01-01	2017-12-31	731
Bear_assembly_Genia	2016-01-01	2017-12-31	731
...	...	...	...
Wolf_public_Norma	2016-01-01	2017-12-31	731
Wolf_retail_Harriett	2016-01-01	2017-12-31	731
Wolf_retail_Marcella	2016-01-01	2017-12-31	731
Wolf_retail_Toshia	2016-01-01	2017-12-31	731
Wolf_science_Alfreda	2016-01-01	2017-12-31	731

1578 rows × 3 columns

Unique lenght of series: [731]

All time series have the same length, starting from January 1, 2016 and ending on December 31, 2017. Exogenous variables have few missing values. Skforecast does not require that the time series have the same length, and missing values are allowed as long as the underlying regressor can handle them, which is the case for LightGBM, XGBoost, and HistGradientBoostingRegressor.

# Missing values per feature
# ==============================================================================
data.isna().mean().mul(100).round(2)

building_id               0.00
meter_reading             0.00
site_id                   0.00
primaryspaceusage         1.20
sub_primaryspaceusage     1.20
sqm                       0.00
lat                      14.83
lng                      14.83
timezone                  0.00
airTemperature            0.02
cloudCoverage             7.02
dewTemperature            0.03
precipDepth1HR            0.02
precipDepth6HR            0.02
seaLvlPressure            9.56
windDirection             0.02
windSpeed                 0.02
dtype: float64

Exogenous variables

Exogenous variables are variables that are external to the time series and can be used as features to improve the forecast. In this case, the exogenous variables used are: the characteristics of the buildings, calendar variables and weather conditions.

⚠ Warning

Exogenous variables must be known at the time of the forecast. For example, if temperature is used as an exogenous variable, the temperature value for the next day must be known at the time of the forecast. If the temperature value is not known, the forecast will not be possible.

⚠ Warning

When creating new features in a multi-series dataset, it is important to avoid avoid mixing information of different series. It is recommended to use the groupby and apply methods to create these features.

Building characteristics

One of the key attributes associated with each building is its designated use. This feature may play a crucial role in influencing the energy consumption pattern, as distinct uses can significantly impact both the quantity and timing of energy consumption.

# Number of buildings
# ==============================================================================
print(f"Number of buildings: {data['building_id'].nunique()}")
print(f"Number of building types: {data['primaryspaceusage'].nunique()}")
print(f"Number of building subtypes: {data['sub_primaryspaceusage'].nunique()}")

Number of buildings: 1578
Number of building types: 16
Number of building subtypes: 104

For certain type and subtype categories, there are a limited number of buildings in the dataset. Types with fewer than 100 buildings and subtypes with fewer than 50 buildings are grouped into the "Other" category.

# Group unfrequent categories
# ==============================================================================
infrequent_types = (
    data
    .drop_duplicates(subset=['building_id'])['primaryspaceusage']
    .value_counts()
    .loc[lambda x: x < 100]
    .index
    .tolist()
)
infrequent_subtypes = (
    data
    .drop_duplicates(subset=['building_id'])['sub_primaryspaceusage']
    .value_counts()
    .loc[lambda x: x < 50]
    .index
    .tolist()
)

data['primaryspaceusage'] = np.where(
    data['primaryspaceusage'].isin(infrequent_types),
    'Other',
    data['primaryspaceusage']
)
data['sub_primaryspaceusage'] = np.where(
    data['sub_primaryspaceusage'].isin(infrequent_subtypes),
    'Other',
    data['sub_primaryspaceusage']
)

display(data.drop_duplicates(subset=['building_id'])['primaryspaceusage'].value_counts())
display(data.drop_duplicates(subset=['building_id', 'sub_primaryspaceusage'])['sub_primaryspaceusage'].value_counts())

primaryspaceusage
Education                        604
Office                           296
Entertainment/public assembly    203
Public services                  166
Lodging/residential              149
Other                            141
Name: count, dtype: int64

sub_primaryspaceusage
Other                          612
Office                         295
College Classroom              131
College Laboratory             116
K-12 School                    109
Dormitory                       91
Primary/Secondary Classroom     84
Education                       67
Library                         54
Name: count, dtype: int64

Calendar features

# Calendar features
# ==============================================================================
features_to_extract = [
    'month',
    'week',
    'day_of_week',
]
calendar_transformer = DatetimeFeatures(
                            variables           = 'index',
                            features_to_extract = features_to_extract,
                            drop_original       = False,
                       )
data = calendar_transformer.fit_transform(data)
print(f"Data shape: {data.shape}")
data.head(3)

Data shape: (1153518, 20)

	building_id	meter_reading	site_id	primaryspaceusage	sub_primaryspaceusage	sqm	lat	lng	timezone	airTemperature	cloudCoverage	dewTemperature	precipDepth1HR	precipDepth6HR	seaLvlPressure	windDirection	windSpeed	month	week	day_of_week
timestamp
2016-01-01	Bear_assembly_Angel	12808.1620	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	6.1750	1.666667	-5.229167	0.0	0.0	1020.891667	68.750000	3.070833	1	53	4
2016-01-02	Bear_assembly_Angel	9251.0003	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	8.0875	NaN	-1.404167	0.0	0.0	1017.687500	76.666667	3.300000	1	53	5
2016-01-03	Bear_assembly_Angel	14071.6500	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	10.1125	NaN	1.708333	-6.0	-2.0	1011.491667	91.666667	3.120833	1	53	6

# Cyclical encoding of calendar features
# ==============================================================================
features_to_encode = [
    "month",
    "week",
    "day_of_week",
]
max_values = {
    "month": 12,
    "week": 52,
    "day_of_week": 6,
}
cyclical_encoder = CyclicalFeatures(
                        variables     = features_to_encode,
                        max_values    = max_values,
                        drop_original = False
                   )

data = cyclical_encoder.fit_transform(data)
print(f"Data shape: {data.shape}")
data.head(3)

Data shape: (1153518, 26)

	building_id	meter_reading	site_id	primaryspaceusage	sub_primaryspaceusage	sqm	lat	lng	timezone	airTemperature	...	windSpeed	month	week	day_of_week	month_sin	month_cos	week_sin	week_cos	day_of_week_sin	day_of_week_cos
timestamp
2016-01-01	Bear_assembly_Angel	12808.1620	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	6.1750	...	3.070833	1	53	4	0.5	0.866025	0.120537	0.992709	-8.660254e-01	-0.5
2016-01-02	Bear_assembly_Angel	9251.0003	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	8.0875	...	3.300000	1	53	5	0.5	0.866025	0.120537	0.992709	-8.660254e-01	0.5
2016-01-03	Bear_assembly_Angel	14071.6500	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	10.1125	...	3.120833	1	53	6	0.5	0.866025	0.120537	0.992709	-2.449294e-16	1.0

3 rows × 26 columns

✎ Note

For more information about calendar features and cyclical encoding visit Calendar features and Cyclical features in time series forecasting.

Meteorological features

Meteorological variables are recorded at the site level, meaning that weather data varies by building location, even at the same timestamp. In other words, while the exogenous variables are consistent across all series, their values differ by location.

# Values of meteorological features for a guiven date in each site
# ==============================================================================
data.loc["2016-01-01"].groupby("site_id", observed=True).agg(
    {
        "airTemperature": "first",
        "cloudCoverage": "first",
        "dewTemperature": "first",
        "precipDepth1HR": "first",
        "precipDepth6HR": "first",
        "seaLvlPressure": "first",
        "windDirection": "first",
        "windSpeed": "first",
    }
)

	airTemperature	cloudCoverage	dewTemperature	precipDepth1HR	precipDepth6HR	seaLvlPressure	windDirection	windSpeed
site_id
Bear	6.175000	1.666667	-5.229167	0.0	0.0	1020.891667	68.750000	3.070833
Bobcat	-11.595833	0.000000	-17.041667	0.0	0.0	1034.466667	260.869565	3.033333
Bull	7.612500	4.000000	0.708333	-2.0	-1.0	1031.762500	130.416667	3.712500
Cockatoo	-2.000000	4.000000	-4.066667	0.0	0.0	NaN	281.333333	4.826667
Crow	-1.787500	NaN	-3.595833	15.0	13.0	1011.670833	236.250000	3.666667
Eagle	3.187500	1.000000	-4.487500	0.0	0.0	1017.445833	287.500000	3.470833
Fox	10.200000	1.000000	-2.804167	0.0	0.0	1018.512500	47.083333	0.470833
Gator	23.291667	5.600000	19.625000	-3.0	0.0	1018.663636	150.416667	2.520833
Hog	-5.583333	1.142857	-9.741667	-2.0	-1.0	1019.545833	260.416667	4.758333
Lamb	6.913043	0.000000	5.434783	0.0	0.0	NaN	123.181818	8.017391
Moose	-1.787500	NaN	-3.595833	15.0	13.0	1011.670833	236.250000	3.666667
Mouse	5.387500	0.000000	3.879167	0.0	0.0	1016.941667	116.666667	4.470833
Panther	23.291667	5.600000	19.625000	-3.0	0.0	1018.663636	150.416667	2.520833
Peacock	5.558333	0.000000	-1.541667	0.0	0.0	1019.783333	120.000000	1.275000
Rat	5.633333	4.727273	-2.112500	0.0	0.0	1020.450000	314.782609	3.816667
Robin	5.387500	0.000000	3.879167	0.0	0.0	1016.941667	116.666667	4.470833
Shrew	5.387500	0.000000	3.879167	0.0	0.0	1016.941667	116.666667	4.470833
Swan	6.054167	0.400000	-2.591667	0.0	0.0	1021.216667	154.000000	1.783333
Wolf	5.716667	6.625000	3.208333	0.0	1.0	1007.625000	140.000000	8.875000

Skforecast allows you to include different exogenous variables and/or different values for each series in the dataset (more details in the next section).

Rolling features for exogenous variables

# Rolling features of meteorological features
# ==============================================================================
wf_transformer = WindowFeatures(
    variables      = ["airTemperature", "windSpeed"],
    window         = ["7D", "14D"],
    functions      = ["mean"],
    freq           = "D",
    missing_values = "ignore",
    drop_na        = False,
)
data = data.groupby("building_id").apply(wf_transformer.fit_transform, include_groups=False).reset_index()
data = data.set_index("timestamp")
print(f"Data shape: {data.shape}")
data.head(3)

Data shape: (1153518, 30)

	building_id	meter_reading	site_id	primaryspaceusage	sub_primaryspaceusage	sqm	lat	lng	timezone	airTemperature	...	month_sin	month_cos	week_sin	week_cos	day_of_week_sin	day_of_week_cos	airTemperature_window_7D_mean	windSpeed_window_7D_mean	airTemperature_window_14D_mean	windSpeed_window_14D_mean
timestamp
2016-01-01	Bear_assembly_Angel	12808.1620	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	6.1750	...	0.5	0.866025	0.120537	0.992709	-8.660254e-01	-0.5	NaN	NaN	NaN	NaN
2016-01-02	Bear_assembly_Angel	9251.0003	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	8.0875	...	0.5	0.866025	0.120537	0.992709	-8.660254e-01	0.5	6.17500	3.070833	6.17500	3.070833
2016-01-03	Bear_assembly_Angel	14071.6500	Bear	Entertainment/public assembly	Other	22117.0	37.871903	-122.260729	US/Pacific	10.1125	...	0.5	0.866025	0.120537	0.992709	-2.449294e-16	1.0	7.13125	3.185417	7.13125	3.185417

3 rows × 30 columns

Categorical features

LightGBM can handle categorical variables internally without any preprocessing. To allow automatic detection of categorical features based on pandas data types in a Forecaster, categorical variables must first be encoded as integers (ordinal encoding) and then stored as a category type. This is necessary because skforecast uses a numeric array internally to speed up the calculation, and LightGBM requires the categorical features to be encoded as category to be detected automatically. It is also necessary to set the categorical_features parameter to 'auto' during the initialization of the forecaster using fit_kwargs = {'categorical_feature': 'auto'}.

⚠ Warning

The four main gradient boosting frameworks – LightGBM, scikit-learn's HistogramGradientBoosting, XGBoost, and CatBoost – are capable of directly handling categorical features within the model. However, it is important to note that each framework has its own configurations, benefits and potential problems. To fully comprehend how to use these frameworks, it is highly recommended to refer to the skforecast user guide for a detailed understanding.

# Transformer: ordinal encoding
# ==============================================================================
# A ColumnTransformer is used to transform categorical (not numerical) features
# using ordinal encoding. Numeric features are left untouched. Missing values
# are coded as -1. If a new category is found in the test set, it is encoded
# as -1.
categorical_features = ['primaryspaceusage', 'sub_primaryspaceusage', 'timezone']
transformer_exog = make_column_transformer(
                       (
                           OrdinalEncoder(
                               dtype=float,
                               handle_unknown="use_encoded_value",
                               unknown_value=np.nan,
                               encoded_missing_value=np.nan
                           ),
                           categorical_features
                       ),
                       remainder="passthrough",
                       verbose_feature_names_out=False,
                   ).set_output(transform="pandas")
transformer_exog

ColumnTransformer(remainder='passthrough',
                  transformers=[('ordinalencoder',
                                 OrdinalEncoder(dtype=<class 'float'>,
                                                handle_unknown='use_encoded_value',
                                                unknown_value=nan),
                                 ['primaryspaceusage', 'sub_primaryspaceusage',
                                  'timezone'])],
                  verbose_feature_names_out=False)

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

When creating a Forecaster with LGBMRegressor, it is necessary to specify how to handle the categorical columns using the fit_kwargs argument. This is because the categorical_feature argument is only specified in the fit method of LGBMRegressor, and not during its initialization.

Modelling and forecasting

ForecasterRecursiveMultiSeries allows modeling time series of different lengths and using different exogenous variables. When series have different lengths, the data must be transformed into a dictionary. The keys of the dictionary are the names of the series and the values are the series themselves. To do this, the series_long_to_dict function is used, which takes the DataFrame in "long format" and returns a dict of pandas Series. Similarly, when the exogenous variables are different (values or variables) for each series, the data must be transformed into a dictionary. The keys of the dictionary are the names of the series and the values are the exogenous variables themselves. The exog_long_to_dict function is used, which takes the DataFrame in "long format" and returns a dict of exogenous variables (pandas Series or pandas DataFrame).

When all series have the same length and the same exogenous variables, it is not necessary to use dictionaries. Series can be passed as unique DataFrame with each series in a column, and exogenous variables can be passed as a DataFrame with the same length as the series.

✎ Note

For more information about modelling series of different lengths and using different exogenous variables, visit Global Forecasting Models: Time series with different lengths and different exogenous variables.

# Exogenous features selected for modeling
# ==============================================================================
exog_features = [
    "primaryspaceusage",
    "sub_primaryspaceusage",
    "timezone",
    "sqm",
    "airTemperature",
    "cloudCoverage",
    "dewTemperature",
    "precipDepth1HR",
    "precipDepth6HR",
    "seaLvlPressure",
    "windDirection",
    "windSpeed",
    "day_of_week_sin",
    "day_of_week_cos",
    "week_sin",
    "week_cos",
    "month_sin",
    "month_cos",
    "airTemperature_window_7D_mean",
    "windSpeed_window_7D_mean",
    "airTemperature_window_14D_mean",
    "windSpeed_window_14D_mean",
]

# Transform series and exog to dictionaries
# ==============================================================================
series_dict = series_long_to_dict(
    data      = data.reset_index(),
    series_id = 'building_id',
    index     = 'timestamp',
    values    = 'meter_reading',
    freq      = 'D'
)

exog_dict = exog_long_to_dict(
    data      = data[exog_features + ['building_id']].reset_index(),
    series_id = 'building_id',
    index     = 'timestamp',
    freq      = 'D'
)

To train the models, search for optimal hyperparameters, and evaluate their predictive performance, the data is divided into three separate sets: training, validation, and test.

# Partition data in train and test
# ==============================================================================
data = data.sort_index()
end_train = '2017-08-31 23:59:00'
end_validation = '2017-10-31 23:59:00'
series_dict_train = {k: v.loc[: end_train,] for k, v in series_dict.items()}
series_dict_valid = {k: v.loc[end_train: end_validation,] for k, v in series_dict.items()}
series_dict_test = {k: v.loc[end_validation:,] for k, v in series_dict.items()}
exog_dict_train = {k: v.loc[: end_train,] for k, v in exog_dict.items()}
exog_dict_valid = {k: v.loc[end_train: end_validation,] for k, v in exog_dict.items()}
exog_dict_test = {k: v.loc[end_validation:,] for k, v in exog_dict.items()}

print(
    f"Rage of dates available : {data.index.min()} --- {data.index.max()} "
    f"(n_days={(data.index.max() - data.index.min()).days})"
)
print(
    f"  Dates for training    : {data.loc[: end_train, :].index.min()} --- {data.loc[: end_train, :].index.max()} "
    f"(n_days={(data.loc[: end_train, :].index.max() - data.loc[: end_train, :].index.min()).days})"
)
print(
    f"  Dates for validation  : {data.loc[end_train:end_validation, :].index.min()} --- {data.loc[end_train:end_validation, :].index.max()} "
    f"(n_days={(data.loc[end_train:end_validation, :].index.max() - data.loc[end_train:end_validation, :].index.min()).days})"
)
print(
    f"  Dates for test        : {data.loc[end_validation:, :].index.min()} --- {data.loc[end_validation:, :].index.max()} "
    f"(n_days={(data.loc[end_validation:, :].index.max() - data.loc[end_validation:, :].index.min()).days})"
)

Rage of dates available : 2016-01-01 00:00:00 --- 2017-12-31 00:00:00 (n_days=730)
  Dates for training    : 2016-01-01 00:00:00 --- 2017-08-31 00:00:00 (n_days=608)
  Dates for validation  : 2017-09-01 00:00:00 --- 2017-10-31 00:00:00 (n_days=60)
  Dates for test        : 2017-11-01 00:00:00 --- 2017-12-31 00:00:00 (n_days=60)

Hyperparameter tuning

Hyperparameter and lag tuning involves systematically testing different values or combinations of hyperparameters (and/or lags) to find the optimal configuration that gives the best performance. The skforecast library provides two different methods to evaluate each candidate configuration:

Backtesting: In this method, the model predicts several steps ahead in each iteration, using the same forecast horizon and retraining frequency strategy that would be used if the model were deployed. This simulates a real forecasting scenario where the model is retrained and updated over time.
One-Step Ahead: Evaluates the model using only one-step-ahead predictions. This method is faster because it requires fewer iterations, but it only tests the model's performance in the immediate next time step (t+1).

Each method uses a different evaluation strategy, so they may produce different results. However, in the long run, both methods are expected to converge to similar selections of optimal hyperparameters. The one-step-ahead method is much faster than backtesting because it requires fewer iterations, but it only tests the model's performance in the immediate next time step. It is recommended to backtest the final model for a more accurate multi-step performance estimate.

# Create forecaster
# ==============================================================================
window_features = RollingFeatures(stats=['mean', 'min', 'max'], window_sizes=7)
forecaster = ForecasterRecursiveMultiSeries(
                regressor        = LGBMRegressor(random_state=8520, verbose=-1),
                lags             = 14,
                window_features  = window_features,
                transformer_exog = transformer_exog,
                fit_kwargs       = {'categorical_feature': categorical_features},
                encoding         = "ordinal"
            )

# Bayesian search with OneStepAheadFold
# ==============================================================================
def search_space(trial):
    search_space  = {
        'lags'            : trial.suggest_categorical('lags', [31, 62]),
        'n_estimators'    : trial.suggest_int('n_estimators', 200, 800, step=100),
        'max_depth'       : trial.suggest_int('max_depth', 3, 8, step=1),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 25, 500),
        'learning_rate'   : trial.suggest_float('learning_rate', 0.01, 0.5),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 0.8, step=0.1),
        'max_bin'         : trial.suggest_int('max_bin', 50, 100, step=25),
        'reg_alpha'       : trial.suggest_float('reg_alpha', 0, 1, step=0.1),
        'reg_lambda'      : trial.suggest_float('reg_lambda', 0, 1, step=0.1)
    }

    return search_space

cv = OneStepAheadFold(initial_train_size=608) # size of the training set

results_search, best_trial = bayesian_search_forecaster_multiseries(
    forecaster        = forecaster,
    series            = {k: v.loc[:end_validation,] for k, v in series_dict.items()},
    exog              = {k: v.loc[:end_validation, exog_features] for k, v in exog_dict.items()},
    cv                = cv,
    search_space      = search_space,
    n_trials          = 10,
    metric            = "mean_absolute_error",
    suppress_warnings = True
)

best_params = results_search.at[0, 'params']
best_lags = results_search.at[0, 'lags']
results_search.head(3)

  0%|          | 0/10 [00:00<?, ?it/s]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
 49 50 51 52 53 54 55 56 57 58 59 60 61 62] 
  Parameters: {'n_estimators': 600, 'max_depth': 6, 'min_data_in_leaf': 188, 'learning_rate': 0.1590191866233202, 'feature_fraction': 0.6, 'max_bin': 100, 'reg_alpha': 0.9, 'reg_lambda': 0.5}
  Backtesting metric: 261.5074045473127
  Levels: ['Bear_assembly_Angel', 'Bear_assembly_Beatrice', 'Bear_assembly_Danial', 'Bear_assembly_Diana', 'Bear_assembly_Genia', 'Bear_assembly_Harry', 'Bear_assembly_Jose', 'Bear_assembly_Roxy', 'Bear_assembly_Ruby', 'Bear_education_Alfredo', '...', 'Wolf_office_Emanuel', 'Wolf_office_Haydee', 'Wolf_office_Joana', 'Wolf_office_Nadia', 'Wolf_office_Rochelle', 'Wolf_public_Norma', 'Wolf_retail_Harriett', 'Wolf_retail_Marcella', 'Wolf_retail_Toshia', 'Wolf_science_Alfreda']

	levels	lags	params	mean_absolute_error__weighted_average	mean_absolute_error__average	mean_absolute_error__pooling	n_estimators	max_depth	min_data_in_leaf	learning_rate	feature_fraction	max_bin	reg_alpha	reg_lambda
0	[Bear_assembly_Angel, Bear_assembly_Beatrice, ...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 600, 'max_depth': 6, 'min_dat...	261.507405	261.507405	261.507405	600.0	6.0	188.0	0.159019	0.6	100.0	0.9	0.5
1	[Bear_assembly_Angel, Bear_assembly_Beatrice, ...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 500, 'max_depth': 5, 'min_dat...	266.299734	266.299734	266.299734	500.0	5.0	227.0	0.163008	0.6	100.0	1.0	0.5
2	[Bear_assembly_Angel, Bear_assembly_Beatrice, ...	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 400, 'max_depth': 5, 'min_dat...	266.304518	266.304518	266.304518	400.0	5.0	437.0	0.132723	0.6	100.0	0.5	0.6

Backtesting on test data

# Backtesting on test set
# ==============================================================================
cv = TimeSeriesFold(
    initial_train_size = 608 + 60, # training + validation
    steps              = 7,
    refit              = False
)
metrics, predictions = backtesting_forecaster_multiseries(
                          forecaster        = forecaster,
                          series            = series_dict,
                          exog              = exog_dict,
                          cv                = cv,
                          metric            = 'mean_absolute_error',
                          suppress_warnings = True
                      )

display(predictions.head())
display(metrics)

  0%|          | 0/9 [00:00<?, ?it/s]

	level	pred
2017-10-30	Bear_assembly_Angel	9561.254764
2017-10-30	Bear_assembly_Beatrice	1112.779749
2017-10-30	Bear_assembly_Danial	4172.709133
2017-10-30	Bear_assembly_Diana	26.412851
2017-10-30	Bear_assembly_Genia	7780.739532

	levels	mean_absolute_error
0	Bear_assembly_Angel	1244.039948
1	Bear_assembly_Beatrice	244.728577
2	Bear_assembly_Danial	281.653933
3	Bear_assembly_Diana	39.561433
4	Bear_assembly_Genia	563.580736
...	...	...
1576	Wolf_retail_Toshia	519.948486
1577	Wolf_science_Alfreda	220.802361
1578	average	347.156677
1579	weighted_average	347.156677
1580	pooling	347.156677

1581 rows × 2 columns

# Aggregated metric for all buildings
# ==============================================================================
average_metric_all_buildings = metrics.query("levels == 'average'")["mean_absolute_error"].item()
errors_all_buildings = pd.merge(
    left     = data[['building_id', 'meter_reading']].reset_index(),
    right    = predictions.rename_axis("timestamp").reset_index(),
    left_on  = ['timestamp', 'building_id'],
    right_on = ['timestamp', 'level'],
    how      = 'inner',
    validate = "1:1"
).assign(error=lambda df: df['meter_reading'] - df['pred'])

sum_abs_errors_all_buildings = errors_all_buildings['error'].abs().sum().sum()
sum_bias_all_buildings = errors_all_buildings['error'].sum().sum()
print(f"Average mean absolute error for all buildings: {average_metric_all_buildings:.0f}")
print(f"Sum of absolute errors for all buildings (x 10,000): {sum_abs_errors_all_buildings/10000:.0f}")
print(f"Bias (x 10,000): {sum_bias_all_buildings/10000:.0f}")

Average mean absolute error for all buildings: 347
Sum of absolute errors for all buildings (x 10,000): 3451
Bias (x 10,000): -391

predictions

	level	pred
2017-10-30	Bear_assembly_Angel	9561.254764
2017-10-30	Bear_assembly_Beatrice	1112.779749
2017-10-30	Bear_assembly_Danial	4172.709133
2017-10-30	Bear_assembly_Diana	26.412851
2017-10-30	Bear_assembly_Genia	7780.739532
...	...	...
2017-12-31	Wolf_public_Norma	1961.736704
2017-12-31	Wolf_retail_Harriett	634.324994
2017-12-31	Wolf_retail_Marcella	-80.704692
2017-12-31	Wolf_retail_Toshia	821.663849
2017-12-31	Wolf_science_Alfreda	1528.388111

99414 rows × 2 columns

# Plot predictions vs real value for 2 random buildings
# ==============================================================================
rng = np.random.default_rng(14793)
n_buildings = 2
selected_buildings = rng.choice(data['building_id'].unique(), size=n_buildings, replace=False)

fig, axs = plt.subplots(n_buildings, 1, figsize=(7, 4.5), sharex=True)
axs = axs.flatten()

for i, building in enumerate(selected_buildings):
    data.query("building_id == @building").loc[predictions.index, 'meter_reading'].plot(ax=axs[i], label='test')
    predictions.query("level == @building")['pred'].plot(ax=axs[i], label='predictions')
    axs[i].set_title(f"Building {building}", fontsize=10)
    axs[i].set_xlabel("")
    axs[i].legend()

fig.tight_layout()
plt.show();

Feature selection

Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: to simplify models to make them easier to interpret, to reduce training time, to avoid the curse of dimensionality, to improve generalization by reducing overfitting (formally, variance reduction), and others.

Skforecast is compatible with the feature selection methods implemented in the scikit-learn library. There are several methods for feature selection, but the most common are:

Recursive feature elimination (RFE)
Sequential Feature Selection (SFS)
Feature selection based on threshold (SelectFromModel)

💡 Tip

Feature selection is a powerful tool for improving the performance of machine learning models. However, it is computationally expensive and can be time-consuming. Since the goal is to find the best subset of features, not the best model, it is not necessary to use the entire data set or a highly complex model. Instead, it is recommended to use a small subset of the data and a simple model. Once the best subset of features has been identified, the model can then be trained using the entire dataset and a more complex configuration.

# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
warnings.filterwarnings("ignore", message="X does not have valid feature names.*")
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(estimator=regressor, step=1, cv=3, n_jobs=1)
selected_lags, selected_window_features, selected_exog = select_features_multiseries(
    forecaster      = forecaster,
    selector        = selector,
    series          = {k: v.loc[:end_validation,] for k, v in series_dict.items()},
    exog            = {k: v.loc[:end_validation, exog_features] for k, v in exog_dict.items()},
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.2,
    random_state    = 123,
    verbose         = True,
)

╭──────────────────────────────── MissingValuesWarning ────────────────────────────────╮
│ NaNs detected in `X_train`. Some regressors do not allow NaN values during training. │
│ If you want to drop them, set `forecaster.dropna_from_series = True`.                │
│                                                                                      │
│ Category : MissingValuesWarning                                                      │
│ Location :                                                                           │
│ /home/joaquin/miniconda3/envs/skforecast_16_py12/lib/python3.12/site-packages/skfore │
│ cast/recursive/_forecaster_recursive_multiseries.py:1213                             │
│ Suppress : warnings.simplefilter('ignore', category=MissingValuesWarning)            │
╰──────────────────────────────────────────────────────────────────────────────────────╯

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 959424
Total number of records used for feature selection: 191884
Number of features available: 87
    Lags            (n=62)
    Window features (n=3)
    Exog            (n=22)
Number of features selected: 56
    Lags            (n=36) : [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 28, 29, 30, 33, 34, 35, 42, 44, 49, 50, 51, 54, 56, 57, 60, 62]
    Window features (n=3) : ['roll_mean_7', 'roll_min_7', 'roll_max_7']
    Exog            (n=17) : ['sub_primaryspaceusage', 'timezone', 'sqm', 'airTemperature', 'cloudCoverage', 'dewTemperature', 'seaLvlPressure', 'windDirection', 'windSpeed', 'day_of_week_sin', 'day_of_week_cos', 'week_sin', 'week_cos', 'airTemperature_window_7D_mean', 'windSpeed_window_7D_mean', 'airTemperature_window_14D_mean', 'windSpeed_window_14D_mean']

# Backtesting forecaster with selected features
# ==============================================================================
forecaster = ForecasterRecursiveMultiSeries(
                regressor        = LGBMRegressor(**best_params, random_state=8520, verbose=-1),
                lags             = selected_lags,
                window_features  = window_features,
                transformer_exog = transformer_exog,
                fit_kwargs       = {'categorical_feature': categorical_features},
                encoding         = "ordinal"
            )
cv = TimeSeriesFold(
    initial_train_size = 608 + 60, # Entreanmiento + validación
    steps              = 7,
    refit              = False
)
metrics, predictions = backtesting_forecaster_multiseries(
                          forecaster        = forecaster,
                          series            = series_dict,
                          exog              = {k: v[exog_features] for k, v in exog_dict.items()},
                          cv                = cv,
                          metric            = 'mean_absolute_error',
                          suppress_warnings = True
                      )

display(predictions.head())
display(metrics)

  0%|          | 0/9 [00:00<?, ?it/s]

	level	pred
2017-10-30	Bear_assembly_Angel	9297.191358
2017-10-30	Bear_assembly_Beatrice	996.638961
2017-10-30	Bear_assembly_Danial	4039.385147
2017-10-30	Bear_assembly_Diana	9.180618
2017-10-30	Bear_assembly_Genia	7354.941371

	levels	mean_absolute_error
0	Bear_assembly_Angel	1221.926524
1	Bear_assembly_Beatrice	218.906181
2	Bear_assembly_Danial	325.030420
3	Bear_assembly_Diana	37.261934
4	Bear_assembly_Genia	642.969646
...	...	...
1576	Wolf_retail_Toshia	514.892785
1577	Wolf_science_Alfreda	228.344802
1578	average	344.096237
1579	weighted_average	344.096237
1580	pooling	344.096237

1581 rows × 2 columns

The number of features included in the model has been reduced without compromising the model's performance. This simplifies the model and speeds up training.

Clustering time series

The idea behind modeling multiple series at the same time is to be able to capture the main patterns that govern the series, thereby reducing the impact of the potential noise that each series may have. This means that series that behave similarly may benefit from being modeled together. One way to identify potential groups of series is to perform a clustering study prior to modeling. If clear groups are identified as a result of clustering, it is appropriate to model each of them separately.

Clustering is an unsupervised analysis technique that groups a set of observations into clusters that contain observations that are considered homogeneous, while observations in different clusters are considered heterogeneous. Algorithms that cluster time series can be divided into two groups: those that use a transformation to create features prior to clustering (feature-driven time series clustering), and those that work directly on the time series (elastic distance measures).

Feature-driven time series clustering: Features describing structural characteristics are extracted from each time series, then these features are fed into arbitrary clustering algorithms. This features are obtained by applying statistical operations that best capture the underlying characteristics: trend, seasonality, periodicity, serial correlation, skewness, kurtosis, chaos, nonlinearity, and self-similarity.
Elastic distance measures: This approach works directly on the time series, adjusting or "realigning" the series in comparison to each other. The best known of this family of measures is Dynamic Time Warping (DTW).

For a detailed example of how time series clustering can improve the forecasting models, see Global Forecasting Models: Comparative Analysis of Single and Multi-Series Forecasting Modeling.

Session information

import session_info
session_info.show(html=False)

-----
feature_engine      1.8.3
lightgbm            4.6.0
matplotlib          3.10.1
numpy               2.2.5
optuna              3.6.2
pandas              2.2.3
session_info        v1.0.1
skforecast          0.16.0
sklearn             1.6.1
-----
IPython             9.1.0
jupyter_client      8.6.3
jupyter_core        5.7.2
notebook            6.5.7
-----
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27) [GCC 11.2.0]
Linux-6.11.0-25-generic-x86_64-with-glibc2.39
-----
Session information updated at 2025-05-16 17:22

Citation

How to cite this document

If you use this document or any part of it, please acknowledge the source, thank you!

Forecasting at scale: modeling thousand time series with a single global model by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py59-scalable-forecasting-models.html

How to cite skforecast

If you use skforecast for a publication, we would appreciate it if you cite the published software.

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2025). skforecast (v0.16.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2025). skforecast (Version 0.16.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.16.0}, month = {05}, year = {2025}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }

Did you like the article? Your support is important

Your contribution will help me to continue generating free educational content. Many thanks! 😊

This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.

Allowed:

Share: copy and redistribute the material in any medium or format.
Adapt: remix, transform, and build upon the material.

Under the following terms:

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Forecasting at scale: modeling thousand time series with a single global model

Joaquín Amat Rodrigo, Javier Escobar Ortiz

November 2024 (last update May 2025)

Introduction

Libraries

Data

Exogenous variables

Building characteristics

Calendar features

Meteorological features

Rolling features for exogenous variables

Categorical features

Modelling and forecasting

Hyperparameter tuning

Backtesting on test data

Feature selection

Clustering time series

Session information

Citation