If you like Skforecast , help us giving a star on GitHub! ⭐️
More about forecasting
When faced with scenarios involving the prediction of hundreds or thousands of time series, a crucial decision arises: should one develop individual models for each series, or should one use a unified model to handle them all at once?
In Single-Series Modeling (Local Forecasting Model), a separate predictive model is created for each time series. While this method provides a comprehensive understanding of each series, its scalability can be challenged by the need to create and maintain hundreds or thousands of models.
Multi-Series Modeling (Global Forecasting Model) involves building a single predictive model that considers all time series simultaneously. It attempts to capture the core patterns that govern the series, thereby mitigating the potential noise that each series might introduce. This approach is computationally efficient, easy to maintain, and can yield more robust generalizations across time series, albeit potentially at the cost of sacrificing some individual insights.
This document shows how to forecast more than 1000 time series with a single model including exogenous features with some of them having different values per series.
💡 Tip
This is the first in a series of documents on global forecasting models:Libraries used in this document.
# Data management
# ==============================================================================
import numpy as np
import pandas as pd
# Plots
# ==============================================================================
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.style.use('seaborn-v0_8-darkgrid')
# Forecasting
# ==============================================================================
import skforecast
import lightgbm
from lightgbm import LGBMRegressor
from sklearn.preprocessing import OrdinalEncoder
from sklearn.compose import make_column_transformer
from sklearn.feature_selection import RFECV
from skforecast.recursive import ForecasterRecursiveMultiSeries
from skforecast.model_selection import TimeSeriesFold, OneStepAheadFold
from skforecast.model_selection import backtesting_forecaster_multiseries
from skforecast.model_selection import bayesian_search_forecaster_multiseries
from skforecast.feature_selection import select_features_multiseries
from skforecast.preprocessing import RollingFeatures
from skforecast.preprocessing import series_long_to_dict
from skforecast.preprocessing import exog_long_to_dict
from feature_engine.datetime import DatetimeFeatures
from feature_engine.creation import CyclicalFeatures
from skforecast.datasets import fetch_dataset
# Configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
print('Versión skforecast:', skforecast.__version__)
print('Versión lightgbm:', lightgbm.__version__)
Data used in this document has been obtained from the The Building Data Genome Project 2 https://github.com/buds-lab/building-data-genome-project-2. The dataset contains information on the energy consumption of more than 1500 buildings. The time range of the times-series data is the two full years (2016 and 2017) and the frequency is hourly measurements of electricity, heating and cooling water, steam, and irrigation meters. Additionally, the dataset includes information of the characteristics of the buildings and the weather conditions. Data has been aggregated to a daily resolution and only the electricity among the different energy sources has been considered.
# Load data
# ==============================================================================
data = fetch_dataset(name='bdg2_daily')
print("Data shape:", data.shape)
data.head(3)
# Overall range of available dates
# ==============================================================================
print(
f"Rage of dates available : {data.index.min()} --- {data.index.max()} "
f"(n_days={(data.index.max() - data.index.min()).days})"
)
# Range of available dates per building
# ==============================================================================
available_dates_per_series = (
data
.dropna(subset="meter_reading")
.reset_index()
.groupby("building_id")
.agg(
min_index=("timestamp", "min"),
max_index=("timestamp", "max"),
n_values=("timestamp", "nunique")
)
)
display(available_dates_per_series)
print(f"Unique lenght of series: {available_dates_per_series.n_values.unique()}")
All time series have the same length, starting from January 1, 2016 and ending on December 31, 2017. Exogenous variables have few missing values. Skforecast does not require that the time series have the same length, and missing values are allowed as long as the underlying regressor can handle them, which is the case for LightGBM, XGBoost, and HistGradientBoostingRegressor.
# Missing values per feature
# ==============================================================================
data.isna().mean().mul(100).round(2)
Exogenous variables are variables that are external to the time series and can be used as features to improve the forecast. In this case, the exogenous variables used are: the characteristics of the buildings, calendar variables and weather conditions.
⚠ Warning
Exogenous variables must be known at the time of the forecast. For example, if temperature is used as an exogenous variable, the temperature value for the next day must be known at the time of the forecast. If the temperature value is not known, the forecast will not be possible.One of the key attributes associated with each building is its designated use. This feature may play a crucial role in influencing the energy consumption pattern, as distinct uses can significantly impact both the quantity and timing of energy consumption.
# Number of buildings
# ==============================================================================
print(f"Number of buildings: {data['building_id'].nunique()}")
print(f"Number of building types: {data['primaryspaceusage'].nunique()}")
print(f"Number of building subtypes: {data['sub_primaryspaceusage'].nunique()}")
For certain type and subtype categories, there are a limited number of buildings in the dataset. Types with fewer than 100 buildings and subtypes with fewer than 50 buildings are grouped into the "Other" category.
# Group unfrequent categories
# ==============================================================================
infrequent_types = (
data
.drop_duplicates(subset=['building_id'])['primaryspaceusage']
.value_counts()
.loc[lambda x: x < 100]
.index
.tolist()
)
infrequent_subtypes = (
data
.drop_duplicates(subset=['building_id'])['sub_primaryspaceusage']
.value_counts()
.loc[lambda x: x < 50]
.index
.tolist()
)
data['primaryspaceusage'] = np.where(
data['primaryspaceusage'].isin(infrequent_types),
'Other',
data['primaryspaceusage']
)
data['sub_primaryspaceusage'] = np.where(
data['sub_primaryspaceusage'].isin(infrequent_subtypes),
'Other',
data['sub_primaryspaceusage']
)
display(data.drop_duplicates(subset=['building_id'])['primaryspaceusage'].value_counts())
display(data.drop_duplicates(subset=['building_id', 'sub_primaryspaceusage'])['sub_primaryspaceusage'].value_counts())
# Calendar features
# ==============================================================================
features_to_extract = [
'month',
'week',
'day_of_week',
]
calendar_transformer = DatetimeFeatures(
variables = 'index',
features_to_extract = features_to_extract,
drop_original = False,
)
data = calendar_transformer.fit_transform(data)
# Cyclical encoding of calendar features
# ==============================================================================
features_to_encode = [
"month",
"week",
"day_of_week",
]
max_values = {
"month": 12,
"week": 52,
"day_of_week": 6,
}
cyclical_encoder = CyclicalFeatures(
variables = features_to_encode,
max_values = max_values,
drop_original = False
)
data = cyclical_encoder.fit_transform(data)
data.head(3)
✎ Note
For more information about calendar features and cyclical encoding visit Calendar features and Cyclical features in time series forecasting.Meteorological variables are recorded at the site level, meaning that weather data varies by building location, even at the same timestamp. In other words, while the exogenous variables are consistent across all series, their values differ by location.
# Values of meteorological features for a guiven date in each site
# ==============================================================================
data.loc["2016-01-01"].groupby("site_id", observed=True).agg(
{
"airTemperature": "first",
"cloudCoverage": "first",
"dewTemperature": "first",
"precipDepth1HR": "first",
"precipDepth6HR": "first",
"seaLvlPressure": "first",
"windDirection": "first",
"windSpeed": "first",
}
)
Skforecast allows you to include different exogenous variables and/or different values for each series in the dataset (more details in the next section).
LightGBM can handle categorical variables internally without any preprocessing. To allow automatic detection of categorical features based on pandas data types in a Forecaster
, categorical variables must first be encoded as integers (ordinal encoding) and then stored as a category
type. This is necessary because skforecast uses a numeric array internally to speed up the calculation, and LightGBM requires the categorical features to be encoded as category
to be detected automatically. It is also necessary to set the categorical_features
parameter to 'auto'
during the initialization of the forecaster using fit_kwargs = {'categorical_feature': 'auto'}
.
⚠ Warning
The four main gradient boosting frameworks – LightGBM, scikit-learn's HistogramGradientBoosting, XGBoost, and CatBoost – are capable of directly handling categorical features within the model. However, it is important to note that each framework has its own configurations, benefits and potential problems. To fully comprehend how to use these frameworks, it is highly recommended to refer to the skforecast user guide for a detailed understanding.# Transformer: ordinal encoding
# ==============================================================================
# A ColumnTransformer is used to transform categorical (not numerical) features
# using ordinal encoding. Numeric features are left untouched. Missing values
# are coded as -1. If a new category is found in the test set, it is encoded
# as -1.
categorical_features = ['primaryspaceusage', 'sub_primaryspaceusage', 'timezone']
transformer_exog = make_column_transformer(
(
OrdinalEncoder(
dtype=float,
handle_unknown="use_encoded_value",
unknown_value=np.nan,
encoded_missing_value=np.nan
),
categorical_features
),
remainder="passthrough",
verbose_feature_names_out=False,
).set_output(transform="pandas")
transformer_exog
When creating a Forecaster with LGBMRegressor
, it is necessary to specify how to handle the categorical columns using the fit_kwargs
argument. This is because the categorical_feature
argument is only specified in the fit
method of LGBMRegressor
, and not during its initialization.
ForecasterRecursiveMultiSeries
allows modeling time series of different lengths and using different exogenous variables. When series have different lengths, the data must be transformed into a dictionary. The keys of the dictionary are the names of the series and the values are the series themselves. To do this, the series_long_to_dict
function is used, which takes the DataFrame
in "long format" and returns a dict
of pandas Series
. Similarly, when the exogenous variables are different (values or variables) for each series, the data must be transformed into a dictionary. The keys of the dictionary are the names of the series and the values are the exogenous variables themselves. The exog_long_to_dict
function is used, which takes the DataFrame
in "long format" and returns a dict
of exogenous variables (pandas Series
or pandas DataFrame
).
When all series have the same length and the same exogenous variables, it is not necessary to use dictionaries. Series can be passed as unique DataFrame
with each series in a column, and exogenous variables can be passed as a DataFrame
with the same length as the series.
✎ Note
For more information about modelling series of different lengths and using different exogenous variables, visit Global Forecasting Models: Time series with different lengths and different exogenous variables.# Exogenous features selected for modeling
# ==============================================================================
exog_features = [
"primaryspaceusage",
"sub_primaryspaceusage",
"timezone",
"sqm",
"airTemperature",
"cloudCoverage",
"dewTemperature",
"precipDepth1HR",
"precipDepth6HR",
"seaLvlPressure",
"windDirection",
"windSpeed",
"day_of_week_sin",
"day_of_week_cos",
"week_sin",
"week_cos",
"month_sin",
"month_cos",
]
# Transform series and exog to dictionaries
# ==============================================================================
series_dict = series_long_to_dict(
data = data.reset_index(),
series_id = 'building_id',
index = 'timestamp',
values = 'meter_reading',
freq = 'D'
)
exog_dict = exog_long_to_dict(
data = data[exog_features + ['building_id']].reset_index(),
series_id = 'building_id',
index = 'timestamp',
freq = 'D'
)
To train the models, search for optimal hyperparameters, and evaluate their predictive performance, the data is divided into three separate sets: training, validation, and test.
# Partition data in train and test
# ==============================================================================
data = data.sort_index()
end_train = '2017-08-31 23:59:00'
end_validation = '2017-10-31 23:59:00'
series_dict_train = {k: v.loc[: end_train,] for k, v in series_dict.items()}
series_dict_valid = {k: v.loc[end_train: end_validation,] for k, v in series_dict.items()}
series_dict_test = {k: v.loc[end_validation:,] for k, v in series_dict.items()}
exog_dict_train = {k: v.loc[: end_train,] for k, v in exog_dict.items()}
exog_dict_valid = {k: v.loc[end_train: end_validation,] for k, v in exog_dict.items()}
exog_dict_test = {k: v.loc[end_validation:,] for k, v in exog_dict.items()}
print(
f"Rage of dates available : {data.index.min()} --- {data.index.max()} "
f"(n_days={(data.index.max() - data.index.min()).days})"
)
print(
f" Dates for training : {data.loc[: end_train, :].index.min()} --- {data.loc[: end_train, :].index.max()} "
f"(n_days={(data.loc[: end_train, :].index.max() - data.loc[: end_train, :].index.min()).days})"
)
print(
f" Dates for validation : {data.loc[end_train:end_validation, :].index.min()} --- {data.loc[end_train:end_validation, :].index.max()} "
f"(n_days={(data.loc[end_train:end_validation, :].index.max() - data.loc[end_train:end_validation, :].index.min()).days})"
)
print(
f" Dates for test : {data.loc[end_validation:, :].index.min()} --- {data.loc[end_validation:, :].index.max()} "
f"(n_days={(data.loc[end_validation:, :].index.max() - data.loc[end_validation:, :].index.min()).days})"
)
Hyperparameter and lag tuning involves systematically testing different values or combinations of hyperparameters (and/or lags) to find the optimal configuration that gives the best performance. The skforecast library provides two different methods to evaluate each candidate configuration:
Backtesting: In this method, the model predicts several steps ahead in each iteration, using the same forecast horizon and retraining frequency strategy that would be used if the model were deployed. This simulates a real forecasting scenario where the model is retrained and updated over time.
One-Step Ahead: Evaluates the model using only one-step-ahead predictions. This method is faster because it requires fewer iterations, but it only tests the model's performance in the immediate next time step (t+1).
Each method uses a different evaluation strategy, so they may produce different results. However, in the long run, both methods are expected to converge to similar selections of optimal hyperparameters. The one-step-ahead method is much faster than backtesting because it requires fewer iterations, but it only tests the model's performance in the immediate next time step. It is recommended to backtest the final model for a more accurate multi-step performance estimate.
# Create forecaster
# ==============================================================================
window_features = RollingFeatures(stats=['mean', 'min', 'max'], window_sizes=7)
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(random_state=8520, verbose=-1),
lags = 14,
window_features = window_features,
transformer_series = None,
transformer_exog = transformer_exog,
fit_kwargs = {'categorical_feature': categorical_features},
encoding = "ordinal"
)
# Bayesian search with OneStepAheadFold
# ==============================================================================
def search_space(trial):
search_space = {
'lags' : trial.suggest_categorical('lags', [31, 62]),
'n_estimators' : trial.suggest_int('n_estimators', 200, 800, step=100),
'max_depth' : trial.suggest_int('max_depth', 3, 8, step=1),
'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 25, 500),
'learning_rate' : trial.suggest_float('learning_rate', 0.01, 0.5),
'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 0.8, step=0.1),
'max_bin' : trial.suggest_int('max_bin', 50, 100, step=25),
'reg_alpha' : trial.suggest_float('reg_alpha', 0, 1, step=0.1),
'reg_lambda' : trial.suggest_float('reg_lambda', 0, 1, step=0.1)
}
return search_space
cv = OneStepAheadFold(initial_train_size=608) # size of the training set
results_search, best_trial = bayesian_search_forecaster_multiseries(
forecaster = forecaster,
series = {k: v.loc[:end_validation,] for k, v in series_dict.items()},
exog = {k: v.loc[:end_validation, exog_features] for k, v in exog_dict.items()},
cv = cv,
search_space = search_space,
n_trials = 10,
metric = "mean_absolute_error",
return_best = True,
verbose = False,
n_jobs = "auto",
show_progress = True,
suppress_warnings = True,
)
best_params = results_search.at[0, 'params']
best_lags = results_search.at[0, 'lags']
results_search.head(3)
# Backtesting on test set
# ==============================================================================
cv = TimeSeriesFold(
initial_train_size = 608 + 60, # training + validation
steps = 7,
refit = False
)
metrics, predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = series_dict,
exog = exog_dict,
cv = cv,
metric = 'mean_absolute_error',
verbose = False,
show_progress = True,
suppress_warnings = True
)
display(predictions.head())
display(metrics)
# Aggregated metric for all buildings
# ==============================================================================
average_metric_all_buildings = metrics.query("levels == 'average'")["mean_absolute_error"].item()
errors_all_buildings = (
predictions
- data.pivot(
columns="building_id",
values="meter_reading",
).loc[predictions.index, predictions.columns]
)
sum_abs_errors_all_buildings = errors_all_buildings.abs().sum().sum()
sum_bias_all_buildings = errors_all_buildings.sum().sum()
print(f"Average mean absolute error for all buildings: {average_metric_all_buildings:.0f}")
print(f"Sum of absolute errors for all buildings (x 10,000): {sum_abs_errors_all_buildings/10000:.0f}")
print(f"Bias (x 10,000): {sum_bias_all_buildings/10000:.0f}")
# Plot predictions vs real value for 2 random buildings
# ==============================================================================
rng = np.random.default_rng(14793)
n_buildings = 2
selected_buildings = rng.choice(data['building_id'].unique(), size=n_buildings, replace=False)
fig, axs = plt.subplots(n_buildings, 1, figsize=(7, 4.5), sharex=True)
axs = axs.flatten()
for i, building in enumerate(selected_buildings):
data.query("building_id == @building").loc[predictions.index, 'meter_reading'].plot(ax=axs[i], label='test')
predictions[building].plot(ax=axs[i], label='predictions')
axs[i].set_title(f"Building {building}", fontsize=10)
axs[i].set_xlabel("")
axs[i].legend()
fig.tight_layout()
plt.show();
Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: to simplify models to make them easier to interpret, to reduce training time, to avoid the curse of dimensionality, to improve generalization by reducing overfitting (formally, variance reduction), and others.
Skforecast is compatible with the feature selection methods implemented in the scikit-learn library. There are several methods for feature selection, but the most common are:
Recursive feature elimination (RFE)
Sequential Feature Selection (SFS)
Feature selection based on threshold (SelectFromModel)
💡 Tip
Feature selection is a powerful tool for improving the performance of machine learning models. However, it is computationally expensive and can be time-consuming. Since the goal is to find the best subset of features, not the best model, it is not necessary to use the entire data set or a highly complex model. Instead, it is recommended to use a small subset of the data and a simple model. Once the best subset of features has been identified, the model can then be trained using the entire dataset and a more complex configuration.# Feature selection (autoregressive and exog) with scikit-learn RFECV
# ==============================================================================
regressor = LGBMRegressor(n_estimators=100, max_depth=5, random_state=15926, verbose=-1)
selector = RFECV(estimator=regressor, step=1, cv=3, n_jobs=1)
selected_lags, selected_window_features, selected_exog = select_features_multiseries(
forecaster = forecaster,
selector = selector,
series = {k: v.loc[:end_validation,] for k, v in series_dict.items()},
exog = {k: v.loc[:end_validation, exog_features] for k, v in exog_dict.items()},
select_only = None,
force_inclusion = None,
subsample = 0.2,
random_state = 123,
verbose = True,
)
# Backtesting forecaster with selected features
# ==============================================================================
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(**best_params, random_state=8520, verbose=-1),
lags = selected_lags,
window_features = window_features,
transformer_series = None,
transformer_exog = transformer_exog,
fit_kwargs = {'categorical_feature': categorical_features},
encoding = "ordinal"
)
cv = TimeSeriesFold(
initial_train_size = 608 + 60, # Entreanmiento + validación
steps = 7,
refit = False
)
metrics, predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = series_dict,
exog = {k: v[exog_features] for k, v in exog_dict.items()},
cv = cv,
metric = 'mean_absolute_error',
verbose = False,
show_progress = True,
suppress_warnings = True
)
display(predictions.head())
display(metrics)
The number of features included in the model has been reduced without compromising the model's performance. This simplifies the model and speeds up training.
The idea behind modeling multiple series at the same time is to be able to capture the main patterns that govern the series, thereby reducing the impact of the potential noise that each series may have. This means that series that behave similarly may benefit from being modeled together. One way to identify potential groups of series is to perform a clustering study prior to modeling. If clear groups are identified as a result of clustering, it is appropriate to model each of them separately.
Clustering is an unsupervised analysis technique that groups a set of observations into clusters that contain observations that are considered homogeneous, while observations in different clusters are considered heterogeneous. Algorithms that cluster time series can be divided into two groups: those that use a transformation to create features prior to clustering (feature-driven time series clustering), and those that work directly on the time series (elastic distance measures).
Feature-driven time series clustering: Features describing structural characteristics are extracted from each time series, then these features are fed into arbitrary clustering algorithms. This features are obtained by applying statistical operations that best capture the underlying characteristics: trend, seasonality, periodicity, serial correlation, skewness, kurtosis, chaos, nonlinearity, and self-similarity.
Elastic distance measures: This approach works directly on the time series, adjusting or "realigning" the series in comparison to each other. The best known of this family of measures is Dynamic Time Warping (DTW).
For a detailed example of how time series clustering can improve the forecasting models, see Global Forecasting Models: Comparative Analysis of Single and Multi-Series Forecasting Modeling.
import session_info
session_info.show(html=False)
How to cite this document
If you use this document or any part of it, please acknowledge the source, thank you!
Scalable Forecasting: modeling thousand time series with a single global model by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py59-scalable-forecasting-model.html
How to cite skforecast
If you use skforecast for a scientific publication, we would appreciate it if you cite the published software.
Zenodo:
Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2023). skforecast (v0.14.0). Zenodo. https://doi.org/10.5281/zenodo.8382788
APA:
Amat Rodrigo, J., & Escobar Ortiz, J. (2023). skforecast (Version 0.14.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788
BibTeX:
@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.14.0}, month = {11}, year = {2024}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }
Did you like the article? Your support is important
Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊
This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.