Global forecasting models: multiple time series forecasting with skforecast

If you like  Skforecast ,  help us giving a star on   GitHub! ⭐️

Global Forecasting Models: Modeling Multiple Time Series with Machine Learning

Joaquín Amat Rodrigo, Javier Escobar Ortiz
October 2022 (last update August 2024)

Global Forecasting Models

In Single-Series Modeling (Local Forecasting Model), a single time series is modeled as a linear or nonlinear combination of its lags and exogenous variables. While this method provides a comprehensive understanding of each series, its scalability can be challenged when dealing with a large number of series.

Multi-Series Modeling (Global Forecasting Model) involves building a single predictive model that considers multiple time series simultaneously. It attempts to capture the core patterns that govern the series, thereby mitigating the potential noise that each series might introduce. This approach is computationally efficient, easy to maintain, and can yield more robust generalizations across time series, albeit potentially at the cost of sacrificing some individual insights. Two strategies of global forecasting models can be distinguished:

Independent multi-series forecasting

In independent multi-series forecasting, a single model is trained on all time series, but each remains independent of the others, meaning that past values of one series are not used as predictors of other series. Modeling them together is useful because the series may follow the same intrinsic pattern with respect to their past and future values. For example, the sales of products A and B in the same store may be unrelated, but they follow the same dynamic, that of the store.

To predict the next n steps, the strategy of recursive multi-step forecasting is applied, with the only difference that the series identifier for which the predictions are to be estimated must be specified.

Dependent multi-series forecasting (Multivariate forecasting)

In dependent multi-series forecasting (multivariate time series), all series are modeled together in a single model, considering that each time series depends not only on its past values but also on the past values of the other series. The forecaster is expected not only to learn the information of each series separately but also to relate them. An example is the measurements made by all the sensors (flow, temperature, pressure...) installed on an industrial machine such as a compressor.

🖉 Note

ForecasterAutoregMultiSeries and ForecasterAutoregMultiSeriesCustom classes cover the use case of independent multi-series forecasting. API Reference ForecasterAutoregMultiVariate class covers the use case of dependent multi-series forecasting (Multivariate forecasting). API Reference

Advantages and limitations

Global forecasting models forecasts do not always outperform single-series forecasts. Which one works best depends largely on the characteristics of the use case to which they are applied. However, the following heuristic is worth keeping in mind:

Advantages of multi-series

  • It is easier to maintain and monitor a single model than several.

  • Since all time series are combined during training, the model has a higher learning capacity even if the series are short.

  • By combining multiple time series, the model can learn more generalizable patterns.

Limitations of multi-series

  • If the series do not follow the same internal dynamics, the model may learn a pattern that does not represent any of them.

  • The series may mask each other, so the model may not predict all of them with the same performance.

  • It is more computationally demanding (time and resources) to train and backtest a big model than several small ones.

Equal Length Time Series

The objective of this study is to compare the forecasting results of a global multi-series model with those of an individual model for each series.

The data has been obtained from the Store Item Demand Forecasting Challenge. This dataset contains 913,000 sales transactions from 01/01/2013 to 31/12/2017 for 50 products (SKU) in 10 stores. The goal is to predict the next 7 days sales for 50 different items in a store using the 5 years of available history.

Libraries

In [1]:
# Data manipulation
# ==============================================================================
import sys
import os
import warnings
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
from tqdm.notebook import tqdm
from statsmodels.graphics.tsaplots import plot_acf

# Modelling and Forecasting
# ==============================================================================
import sklearn
import skforecast
from sklearn.preprocessing import StandardScaler
from sklearn.feature_selection import RFECV
from sklearn.ensemble import  HistGradientBoostingRegressor
from skforecast.ForecasterAutoregMultiSeries import ForecasterAutoregMultiSeries
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import backtesting_forecaster
from skforecast.model_selection import bayesian_search_forecaster
from skforecast.model_selection_multiseries import backtesting_forecaster_multiseries
from skforecast.model_selection_multiseries import bayesian_search_forecaster_multiseries
from skforecast.model_selection_multiseries import select_features_multiseries
from skforecast.plot import set_dark_theme
from skforecast.preprocessing import series_long_to_dict
from skforecast.preprocessing import exog_long_to_dict

# Warnings configuration
# ==============================================================================
warnings.filterwarnings('once')

color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")
Version skforecast: 0.13.0
Version scikit-learn: 1.5.1
Version pandas: 2.2.2
Version numpy: 2.0.1

Data

In [2]:
# Data loading
# ======================================================================================
data = pd.read_csv('./train_stores_kaggle.csv')
display(data)
print(f"Shape: {data.shape}")
date store item sales
0 2013-01-01 1 1 13
1 2013-01-02 1 1 11
2 2013-01-03 1 1 14
3 2013-01-04 1 1 13
4 2013-01-05 1 1 10
... ... ... ... ...
912995 2017-12-27 10 50 63
912996 2017-12-28 10 50 59
912997 2017-12-29 10 50 74
912998 2017-12-30 10 50 62
912999 2017-12-31 10 50 82

913000 rows × 4 columns

Shape: (913000, 4)
In [3]:
# Data preprocessing
# ======================================================================================
selected_store = 2
selected_items = data.item.unique()
data = data[(data['store'] == selected_store) & (data['item'].isin(selected_items))].copy()
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = pd.pivot_table(
           data    = data,
           values  = 'sales',
           index   = 'date',
           columns = 'item'
       )
data.columns.name = None
data.columns = [f"item_{col}" for col in data.columns]
data = data.asfreq('1D')
data = data.sort_index()
data.head(4)
Out[3]:
item_1 item_2 item_3 item_4 item_5 item_6 item_7 item_8 item_9 item_10 ... item_41 item_42 item_43 item_44 item_45 item_46 item_47 item_48 item_49 item_50
date
2013-01-01 12.0 41.0 19.0 21.0 4.0 34.0 39.0 49.0 28.0 51.0 ... 11.0 25.0 36.0 12.0 45.0 43.0 12.0 45.0 29.0 43.0
2013-01-02 16.0 33.0 32.0 14.0 6.0 40.0 47.0 42.0 21.0 56.0 ... 19.0 21.0 35.0 25.0 50.0 52.0 13.0 37.0 25.0 57.0
2013-01-03 16.0 46.0 26.0 12.0 12.0 41.0 43.0 46.0 29.0 46.0 ... 23.0 20.0 52.0 18.0 56.0 30.0 5.0 45.0 30.0 45.0
2013-01-04 20.0 50.0 34.0 17.0 16.0 41.0 44.0 55.0 32.0 56.0 ... 15.0 28.0 50.0 24.0 57.0 46.0 19.0 32.0 20.0 45.0

4 rows × 50 columns

The dataset is divided into 3 partitions: one for training, one for validation, and one for testing.

In [4]:
# Split data into train-validation-test
# ======================================================================================
end_train = '2016-05-31 23:59:00'
end_val = '2017-05-31 23:59:00'

data_train = data.loc[:end_train, :].copy()
data_val   = data.loc[end_train:end_val, :].copy()
data_test  = data.loc[end_val:, :].copy()
print(f"Train dates      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Validation dates : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Test dates       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")
Train dates      : 2013-01-01 00:00:00 --- 2016-05-31 00:00:00  (n=1247)
Validation dates : 2016-06-01 00:00:00 --- 2017-05-31 00:00:00  (n=365)
Test dates       : 2017-06-01 00:00:00 --- 2017-12-31 00:00:00  (n=214)

Four of the series are plotted to understand their trends and patterns. The reader is strongly encouraged to plot several more to gain an in-depth understanding of the series.

In [5]:
# Plot time series
# ======================================================================================
set_dark_theme()
fig, axs = plt.subplots(4, 1, figsize=(7, 5), sharex=True)
data.iloc[:, :4].plot(
    legend   = True,
    subplots = True, 
    title    = 'Sales of store 2',
    ax       = axs, 
)
for ax in axs:
    ax.axvline(pd.to_datetime(end_train) , color='white', linestyle='--', linewidth=1.5)
    ax.axvline(pd.to_datetime(end_val) , color='white', linestyle='--', linewidth=1.5)
fig.tight_layout()
plt.show()
In [6]:
# Autocorrelation plot
# ======================================================================================
fig, axes = plt.subplots(nrows=4, ncols=1, figsize=(7, 5), sharex=True)
axes = axes.flat
for i, col in enumerate(data.columns[:4]):
    plot_acf(data[col], ax=axes[i], lags=7*5)
    axes[i].set_ylim(-1, 1.1)
    axes[i].set_title(f'{col}', fontsize=10)
fig.tight_layout()
plt.show()

The autocorrelation plots show a clear association between sales on one day and sales on the same day a week earlier (lag 7). This type of correlation is an indication that autoregressive models can work well. There is also a common weekly seasonality between the series. The more similar the dynamics between the series, the more likely the model will learn useful patterns.

Individual Forecaster for each product

An individual model (Gradient Boosting Machine) is trained for each item in the store and its mean absolute error in predicting sales over the next 7 days is estimated using backtesting.

In [9]:
# Train and backtest a model for each item: ForecasterAutoreg
# ======================================================================================
items = []
mae_values = []
predictions = {}

for i, item in enumerate(tqdm(data.columns)):
    # Define forecaster
    forecaster = ForecasterAutoreg(
                     regressor     = HistGradientBoostingRegressor(random_state=8523),
                     lags          = 14,
                     transformer_y = StandardScaler()
                 )
    # Backtesting forecaster
    metric, preds = backtesting_forecaster(
                        forecaster         = forecaster,
                        y                  = data[item],
                        initial_train_size = len(data_train) + len(data_val),
                        steps              = 7,
                        metric             = 'mean_absolute_error',
                        refit              = False,
                        fixed_train_size   = False,
                        verbose            = False,
                        show_progress      = False
                    )
    items.append(item)
    mae_values.append(metric.at[0, 'mean_absolute_error'])
    predictions[item] = preds

# Results
uni_series_mae = pd.Series(
                     data  = mae_values,
                     index = items,
                     name  = 'uni_series_mae'
                 )

Global model: Multi-series forecaster

A global model is trained on all series simultaneously. In the predict and backtesting processes, you must specify the levels at which predictions are to be performed. The argument can also be set to None to perform prediction at all available levels.

In [10]:
# Train and backtest a model for all items: ForecasterAutoregMultiSeries
# ======================================================================================
items = list(data.columns)

# Define forecaster
forecaster_ms = ForecasterAutoregMultiSeries(
                    regressor          = HistGradientBoostingRegressor(random_state=8523),
                    lags               = 14,
                    encoding           = 'ordinal',
                    transformer_series = StandardScaler(),
                )
# Backtesting forecaster for all items
multi_series_mae, predictions_ms = backtesting_forecaster_multiseries(
                                       forecaster            = forecaster_ms,
                                       series                = data,
                                       levels                = items,
                                       steps                 = 7,
                                       metric                = 'mean_absolute_error',
                                       add_aggregated_metric = False,
                                       initial_train_size    = len(data_train) + len(data_val),
                                       refit                 = False,
                                       fixed_train_size      = False,
                                       verbose               = False,
                                       show_progress         = True 
                                    )
# Results
display(multi_series_mae.head(3))
print('')
display(predictions_ms.head(3))
levels mean_absolute_error
0 item_1 5.564468
1 item_2 9.348777
2 item_3 7.405529

item_1 item_2 item_3 item_4 item_5 item_6 item_7 item_8 item_9 item_10 ... item_41 item_42 item_43 item_44 item_45 item_46 item_47 item_48 item_49 item_50
2017-06-01 34.974911 91.828608 61.151453 35.780062 30.521633 94.265236 92.233415 125.498303 86.283537 120.629623 ... 36.486671 58.575248 78.914890 44.869470 125.695034 97.118028 37.863895 78.212716 48.474518 109.882483
2017-06-02 38.281849 98.901712 61.956171 37.042634 30.804250 100.879910 97.938244 136.358715 91.309997 123.439280 ... 40.501463 58.849345 84.768281 51.738468 139.721728 103.052595 41.185425 83.371411 52.687136 114.615478
2017-06-03 39.945658 101.468483 66.491690 40.452524 33.131752 108.150494 102.250414 145.422706 94.955882 135.510597 ... 41.601258 65.569756 90.529982 52.617657 141.301919 107.370885 41.126271 89.913940 47.741813 118.617881

3 rows × 50 columns

Comparison

The mean absolute error (MAE) is calculated for each model and compared for each series.

In [11]:
# Difference of backtesting metric for each item
# ======================================================================================
multi_series_mae = multi_series_mae.set_index('levels')
multi_series_mae.columns = ['multi_series_mae']
results = pd.concat((uni_series_mae, multi_series_mae), axis = 1)
results['improvement'] = results.eval('uni_series_mae - multi_series_mae')
results['improvement_(%)'] = 100 * results.eval('(uni_series_mae - multi_series_mae) / uni_series_mae')
results = results.round(2)
results.style.bar(subset=['improvement_(%)'], align='mid', color=['#d65f5f', '#5fba7d'])
Out[11]:
  uni_series_mae multi_series_mae improvement improvement_(%)
item_1 6.190000 5.560000 0.630000 10.120000
item_2 9.850000 9.350000 0.500000 5.050000
item_3 8.660000 7.410000 1.260000 14.520000
item_4 5.430000 4.960000 0.470000 8.640000
item_5 5.000000 4.630000 0.370000 7.320000
item_6 10.360000 9.960000 0.400000 3.860000
item_7 10.040000 9.900000 0.150000 1.450000
item_8 11.300000 10.360000 0.940000 8.330000
item_9 9.450000 8.720000 0.730000 7.760000
item_10 11.570000 10.480000 1.090000 9.430000
item_11 11.220000 10.240000 0.980000 8.780000
item_12 12.070000 10.870000 1.190000 9.880000
item_13 11.950000 11.510000 0.440000 3.680000
item_14 10.170000 9.490000 0.680000 6.660000
item_15 12.870000 11.410000 1.470000 11.410000
item_16 6.090000 5.910000 0.180000 2.950000
item_17 7.690000 7.230000 0.460000 5.940000
item_18 12.550000 11.970000 0.580000 4.610000
item_19 7.880000 7.400000 0.480000 6.130000
item_20 8.370000 7.920000 0.450000 5.380000
item_21 8.520000 8.060000 0.460000 5.430000
item_22 11.850000 10.650000 1.190000 10.070000
item_23 7.410000 6.750000 0.650000 8.800000
item_24 10.710000 9.980000 0.730000 6.840000
item_25 12.520000 11.790000 0.730000 5.860000
item_26 9.110000 8.560000 0.550000 6.010000
item_27 5.520000 5.160000 0.350000 6.430000
item_28 13.170000 12.030000 1.130000 8.620000
item_29 10.710000 10.190000 0.530000 4.910000
item_30 8.410000 7.900000 0.510000 6.110000
item_31 10.720000 10.050000 0.670000 6.220000
item_32 9.550000 9.210000 0.340000 3.540000
item_33 9.420000 9.240000 0.180000 1.880000
item_34 6.540000 5.830000 0.710000 10.910000
item_35 10.770000 10.170000 0.600000 5.530000
item_36 11.550000 10.790000 0.760000 6.570000
item_37 6.530000 6.150000 0.370000 5.680000
item_38 12.060000 11.220000 0.840000 6.980000
item_39 8.050000 7.370000 0.680000 8.480000
item_40 7.220000 6.530000 0.690000 9.510000
item_41 5.690000 5.250000 0.440000 7.770000
item_42 7.290000 6.920000 0.370000 5.060000
item_43 8.650000 8.610000 0.040000 0.500000
item_44 6.530000 6.330000 0.200000 3.000000
item_45 12.760000 11.920000 0.850000 6.640000
item_46 10.050000 9.800000 0.250000 2.470000
item_47 5.220000 4.960000 0.270000 5.090000
item_48 9.100000 8.160000 0.940000 10.320000
item_49 6.170000 6.040000 0.130000 2.120000
item_50 12.000000 10.740000 1.250000 10.440000
In [12]:
# Average improvement for all items
# ======================================================================================
results[['improvement', 'improvement_(%)']].agg(['mean', 'min', 'max'])
Out[12]:
improvement improvement_(%)
mean 0.6172 6.5938
min 0.0400 0.5000
max 1.4700 14.5200
In [13]:
# Number of series with positive and negative improvement
# ======================================================================================
pd.Series(np.where(results['improvement_(%)'] < 0, 'negative', 'positive')).value_counts()
Out[13]:
positive    50
Name: count, dtype: int64

The global model achieves an average improvement of 6.6% compared to using an individual model for each series. For all series, the prediction error evaluated by backtesting is lower when the global model is used. This use case demonstrates that a multi-series model can have advantages over multiple individual models when forecasting time series that follow similar dynamics. In addition to the potential improvements in forecasting, it is also important to consider the benefit of having only one model to maintain and the speed of training and prediction.

⚠ Warning

This comparison was made without optimizing the model hyperparameters. See the Case study 3: Hyperparameter tuning and lags selection section to verify that the conclusions hold when the models are tuned with the best combination of hyperparameters and lags.

Time Series with different lengths and different exogenous variables

When faced with a multi-series forecasting problem, it is common for the series to have varying lengths due to differences in the starting times of data recording. To address this scenario, the ForecasterAutoregMultiSeries and ForecasterAutoregMultiSeriesCustom classes allow the simultaneous modeling of time series of different lengths and using different exogenous variables.

  • When the modeled series have different lengths, they must be stored in a Python dictionary. The keys of the dictionary are the names of the series and the values are the series themselves. All series must be of type pandas.Series, have a datetime index and have the same frequency.

Series values Allowed
[NaN, NaN, NaN, NaN, 4, 5, 6, 7, 8, 9] ✔️
[0, 1, 2, 3, 4, 5, 6, 7, 8, NaN] ✔️
[0, 1, 2, 3, 4, NaN, 6, 7, 8, 9] ✔️
[NaN, NaN, 2, 3, 4, NaN, 6, 7, 8, 9] ✔️



  • When different exogenous variables are used for each series, or if the exogenous variables are the same but have different values for each series, they must be stored in a dictionary. The keys of the dictionary are the names of the series and the values are the exogenous variables themselves. All exogenous variables must be of type pandas.DataFrame or pandas.Series.

Data

The data for this example is stored in "long format" in a single DataFrame. The series_id column identifies the series to which each observation belongs. The timestamp column contains the date of the observation, and the value column contains the value of the series at that date. Each time series is of a different length.

The exogenous variables are stored in a separate DataFrame, also in "long format". The column series_id identifies the series to which each observation belongs. The column timestamp contains the date of the observation, and the remaining columns contain the values of the exogenous variables at that date.

In [7]:
# Load time series of multiple lengths and exogenous variables
# ==============================================================================
series = pd.read_csv(
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast-datasets/main/data/demo_multi_series.csv'
)
exog = pd.read_csv(
    'https://raw.githubusercontent.com/JoaquinAmatRodrigo/skforecast-datasets/main/data/demo_multi_series_exog.csv'
)

series['timestamp'] = pd.to_datetime(series['timestamp'])
exog['timestamp'] = pd.to_datetime(exog['timestamp'])

display(series.head())
print("")
display(exog.head())
series_id timestamp value
0 id_1000 2016-01-01 1012.500694
1 id_1000 2016-01-02 1158.500099
2 id_1000 2016-01-03 983.000099
3 id_1000 2016-01-04 1675.750496
4 id_1000 2016-01-05 1586.250694

series_id timestamp sin_day_of_week cos_day_of_week air_temperature wind_speed
0 id_1000 2016-01-01 -0.433884 -0.900969 6.416639 4.040115
1 id_1000 2016-01-02 -0.974928 -0.222521 6.366474 4.530395
2 id_1000 2016-01-03 -0.781831 0.623490 6.555272 3.273064
3 id_1000 2016-01-04 0.000000 1.000000 6.704778 4.865404
4 id_1000 2016-01-05 0.781831 0.623490 2.392998 5.228913

When series have different lengths, the data must be transformed into a dictionary. The keys of the dictionary are the names of the series and the values are the series themselves. To do this, the series_long_to_dict function is used, which takes the DataFrame in "long format" and returns a dict of series.

Similarly, when the exogenous variables are different (values or variables) for each series, the data must be transformed into a dictionary. The keys of the dictionary are the names of the series and the values are the exogenous variables themselves. The exog_long_to_dict function is used, which takes the DataFrame in "long format" and returns a dict of exogenous variables.

In [8]:
# Transform series and exog to dictionaries
# ==============================================================================
series_dict = series_long_to_dict(
    data      = series,
    series_id = 'series_id',
    index     = 'timestamp',
    values    = 'value',
    freq      = 'D'
)

exog_dict = exog_long_to_dict(
    data      = exog,
    series_id = 'series_id',
    index     = 'timestamp',
    freq      = 'D'
)

Some exogenous variables are omitted for series 1 and 3 to illustrate that different exogenous variables can be used for each series.

In [9]:
# Drop some exogenous variables for series 'id_1000' and 'id_1003'
# ==============================================================================
exog_dict['id_1000'] = exog_dict['id_1000'].drop(columns=['air_temperature', 'wind_speed'])
exog_dict['id_1003'] = exog_dict['id_1003'].drop(columns=['cos_day_of_week'])
In [10]:
# Partition data in train and test
# ==============================================================================
end_train = '2016-07-31 23:59:00'

series_dict_train = {k: v.loc[: end_train,] for k, v in series_dict.items()}
exog_dict_train   = {k: v.loc[: end_train,] for k, v in exog_dict.items()}
series_dict_test  = {k: v.loc[end_train:,] for k, v in series_dict.items()}
exog_dict_test    = {k: v.loc[end_train:,] for k, v in exog_dict.items()}
In [11]:
# Plot series
# ==============================================================================
set_dark_theme()
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
fig, axs = plt.subplots(5, 1, figsize=(8, 4), sharex=True)
for i, s in enumerate(series_dict.values()):
    axs[i].plot(s, label=s.name, color=colors[i])
    axs[i].legend(loc='upper right', fontsize=8)
    axs[i].tick_params(axis='both', labelsize=8)
    axs[i].axvline(pd.to_datetime(end_train) , color='white', linestyle='--', linewidth=1) # End train
In [12]:
# Description of each series
# ==============================================================================
for k in series_dict.keys():
    print(f"{k}:")
    try:
        print(
            f"\tTrain: len={len(series_dict_train[k])}, {series_dict_train[k].index[0]}"
            f" --- {series_dict_train[k].index[-1]}"
        )
    except:
        print(f"\tTrain: len=0")
    try:
        print(
            f"\tTest : len={len(series_dict_test[k])}, {series_dict_test[k].index[0]}"
            f" --- {series_dict_test[k].index[-1]}"
        )
    except:
        print(f"\tTest : len=0")
id_1000:
	Train: len=213, 2016-01-01 00:00:00 --- 2016-07-31 00:00:00
	Test : len=153, 2016-08-01 00:00:00 --- 2016-12-31 00:00:00
id_1001:
	Train: len=30, 2016-07-02 00:00:00 --- 2016-07-31 00:00:00
	Test : len=153, 2016-08-01 00:00:00 --- 2016-12-31 00:00:00
id_1002:
	Train: len=183, 2016-01-01 00:00:00 --- 2016-07-01 00:00:00
	Test : len=0
id_1003:
	Train: len=213, 2016-01-01 00:00:00 --- 2016-07-31 00:00:00
	Test : len=153, 2016-08-01 00:00:00 --- 2016-12-31 00:00:00
id_1004:
	Train: len=91, 2016-05-02 00:00:00 --- 2016-07-31 00:00:00
	Test : len=31, 2016-08-01 00:00:00 --- 2016-08-31 00:00:00
In [13]:
# Exogenous variables for each series
# ==============================================================================
for k in series_dict.keys():
    print(f"{k}:")
    try:
        print(f"\t{exog_dict[k].columns.to_list()}")
    except:
        print(f"\tNo exogenous variables")
id_1000:
	['sin_day_of_week', 'cos_day_of_week']
id_1001:
	['sin_day_of_week', 'cos_day_of_week', 'air_temperature', 'wind_speed']
id_1002:
	['sin_day_of_week', 'cos_day_of_week', 'air_temperature', 'wind_speed']
id_1003:
	['sin_day_of_week', 'air_temperature', 'wind_speed']
id_1004:
	['sin_day_of_week', 'cos_day_of_week', 'air_temperature', 'wind_speed']

Global model: Multi-series forecaster

In [21]:
# Fit forecaster
# ==============================================================================
regressor = HistGradientBoostingRegressor(random_state=123, max_depth=5)
forecaster = ForecasterAutoregMultiSeries(
                regressor          = regressor,
                lags               = 14,
                encoding           = "ordinal",
                dropna_from_series = False
            )

forecaster.fit(series=series_dict_train, exog=exog_dict_train, suppress_warnings=True)
forecaster
Out[21]:
============================ 
ForecasterAutoregMultiSeries 
============================ 
Regressor: HistGradientBoostingRegressor(max_depth=5, random_state=123) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14] 
Transformer for series: None 
Transformer for exog: None 
Series encoding: ordinal 
Window size: 14 
Series levels (names): ['id_1000', 'id_1001', 'id_1002', 'id_1003', 'id_1004'] 
Series weights: None 
Weight function included: False 
Differentiation order: None 
Exogenous included: True 
Type of exogenous variable: <class 'dict'> 
Exogenous variables names: ['sin_day_of_week', 'cos_day_of_week', 'air_temperature', 'wind_speed'] 
Training range: 'id_1000': ['2016-01-01', '2016-07-31'], 'id_1001': ['2016-07-02', '2016-07-31'], 'id_1002': ['2016-01-01', '2016-07-01'], ... 
Training index type: DatetimeIndex 
Training index frequency: D 
Regressor parameters: categorical_features: warn, early_stopping: auto, interaction_cst: None, l2_regularization: 0.0, learning_rate: 0.1, ... 
fit_kwargs: {} 
Creation date: 2024-08-05 20:09:55 
Last fit date: 2024-08-05 20:09:55 
Skforecast version: 0.13.0 
Python version: 3.12.4 
Forecaster id: None 

Only series whose last window of data ends at the same datetime index can be predicted together. If levels = None, series that do not reach the maximum index are excluded from prediction. In this example, series 'id_1002' is excluded because it does not reach the maximum index.

In [22]:
# Predict
# ==============================================================================
predictions = forecaster.predict(steps=5, exog=exog_dict_test, suppress_warnings=True)
predictions
Out[22]:
id_1000 id_1001 id_1003 id_1004
2016-08-01 1502.306015 3663.240057 2850.291902 7436.437194
2016-08-02 1473.680222 3552.416440 2110.674417 8995.170934
2016-08-03 1392.109903 2715.119892 2024.464910 9154.132295
2016-08-04 1334.530819 3003.007676 1793.378346 8939.980361
2016-08-05 1338.640201 2944.149787 1832.215834 8825.396254

Backtesting

When series have different lengths, the backtesting process only returns predictions for the date-times that are present in the series.

In [23]:
# Backtesting
# ==============================================================================
forecaster = ForecasterAutoregMultiSeries(
                 regressor          = regressor, 
                 lags               = 14, 
                 encoding           = "ordinal", 
                 dropna_from_series = False
             )

metrics_levels, backtest_predictions = backtesting_forecaster_multiseries(
    forecaster            = forecaster,
    series                = series_dict,
    exog                  = exog_dict,
    steps                 = 24,
    metric                = "mean_absolute_error",
    add_aggregated_metric = False,
    initial_train_size    = len(series_dict_train["id_1000"]),
    fixed_train_size      = True,
    gap                   = 0,
    allow_incomplete_fold = True,
    refit                 = False,
    n_jobs                ="auto",
    verbose               = True,
    show_progress         = True,
    suppress_warnings     = True
)

display(metrics_levels)
print("")
display(backtest_predictions)
Information of backtesting process
----------------------------------
Number of observations used for initial training: 213
Number of observations used for backtesting: 153
    Number of folds: 7
    Number skipped folds: 0 
    Number of steps per fold: 24
    Number of steps to exclude from the end of each train set before test (gap): 0
    Last fold only includes 9 observations.

Fold: 0
    Training:   2016-01-01 00:00:00 -- 2016-07-31 00:00:00  (n=213)
    Validation: 2016-08-01 00:00:00 -- 2016-08-24 00:00:00  (n=24)
Fold: 1
    Training:   No training in this fold
    Validation: 2016-08-25 00:00:00 -- 2016-09-17 00:00:00  (n=24)
Fold: 2
    Training:   No training in this fold
    Validation: 2016-09-18 00:00:00 -- 2016-10-11 00:00:00  (n=24)
Fold: 3
    Training:   No training in this fold
    Validation: 2016-10-12 00:00:00 -- 2016-11-04 00:00:00  (n=24)
Fold: 4
    Training:   No training in this fold
    Validation: 2016-11-05 00:00:00 -- 2016-11-28 00:00:00  (n=24)
Fold: 5
    Training:   No training in this fold
    Validation: 2016-11-29 00:00:00 -- 2016-12-22 00:00:00  (n=24)
Fold: 6
    Training:   No training in this fold
    Validation: 2016-12-23 00:00:00 -- 2016-12-31 00:00:00  (n=9)

levels mean_absolute_error
0 id_1000 165.813690
1 id_1001 1083.743860
2 id_1002 NaN
3 id_1003 305.675645
4 id_1004 739.624511

id_1000 id_1001 id_1003 id_1004
2016-08-01 1502.306015 3663.240057 2850.291902 7436.437194
2016-08-02 1473.680222 3552.416440 2110.674417 8995.170934
2016-08-03 1392.109903 2715.119892 2024.464910 9154.132295
2016-08-04 1334.530819 3003.007676 1793.378346 8939.980361
2016-08-05 1338.640201 2944.149787 1832.215834 8825.396254
... ... ... ... ...
2016-12-27 1644.881542 1033.972604 1823.096224 NaN
2016-12-28 1564.623579 1002.343792 1832.785013 NaN
2016-12-29 1487.791692 1092.283989 1893.092178 NaN
2016-12-30 1481.653346 1093.638867 1952.998143 NaN
2016-12-31 1309.584690 953.407847 1951.951526 NaN

153 rows × 4 columns

In [24]:
# Plot backtesting predictions
# ==============================================================================
colors = plt.rcParams['axes.prop_cycle'].by_key()['color']
fig, axs = plt.subplots(5, 1, figsize=(8, 4), sharex=True)
for i, s in enumerate(series_dict.keys()):
    axs[i].plot(series_dict[s], label=series_dict[s].name, color=colors[i])
    axs[i].axvline(pd.to_datetime(end_train) , color='white', linestyle='--', linewidth=1)
    try:
        axs[i].plot(backtest_predictions[s], label='prediction', color="white")
    except:
        pass
    axs[i].legend(loc='upper left', fontsize=8)
    axs[i].tick_params(axis='both', labelsize=8)

By allowing the modeling of time series of different lengths and with different exogenous variables, the ForecasterAutoregMultiSeries class provides a flexible and powerful tool for using all available information to train the forecasting models.

Hyperparameter tuning and lags selection

In case study 1, the comparison between forecasters was done without optimizing the hyperparameters of the regressors. To make a fair comparison, a grid search strategy is used in order to select the best configuration for each forecaster. See more information on hyperparameter tuning and lags selection.

⚠ Warning

The following section may require significant computational time (about 45 minutes). Feel free to select only a subset of items to speed up the execution.
In [25]:
# Hyperparameter search and backtesting of each item's model
# ======================================================================================
with open(os.devnull, 'w') as devnull: # Hide prints
    sys.stdout = devnull
    items = []
    mae_values  = []

    def search_space(trial):
        search_space  = {
            'lags'          : trial.suggest_categorical('lags', [7, 14]),
            'max_iter'      : trial.suggest_int('max_iter', 100, 500),
            'max_depth'     : trial.suggest_int('max_depth', 5, 10),
            'learning_rate' : trial.suggest_float('learning_rate', 0.01, 0.1)
        } 

        return search_space

    for item in tqdm(data.columns):

        forecaster = ForecasterAutoreg(
                        regressor     = HistGradientBoostingRegressor(random_state=123),
                        lags          = 14,
                        transformer_y = StandardScaler()
                    )

        results_bayesian = bayesian_search_forecaster(
                            forecaster         = forecaster,
                            y                  = data.loc[:end_val, item],
                            search_space       = search_space,
                            n_trials           = 20,
                            steps              = 7,
                            metric             = 'mean_absolute_error',
                            initial_train_size = len(data_train),
                            refit              = False,
                            fixed_train_size   = False,
                            return_best        = True,
                            verbose            = False,
                            show_progress      = False 
                        )

        metric, preds = backtesting_forecaster(
                            forecaster         = forecaster,
                            y                  = data[item],
                            initial_train_size = len(data_train) + len(data_val),
                            steps              = 7,
                            metric             = 'mean_absolute_error',
                            refit              = False,
                            fixed_train_size   = False,
                            verbose            = False,
                            show_progress      = False
                        )

        items.append(item)
        mae_values.append(metric.at[0, 'mean_absolute_error'])

    uni_series_mae = pd.Series(
                        data  = mae_values,
                        index = items,
                        name  = 'uni_series_mae'
                    )
    
sys.stdout = sys.__stdout__
In [26]:
# Hyperparameter search for the multi-series model and backtesting for each item
# ======================================================================================
def search_space(trial):
    search_space  = {
        'lags'          : trial.suggest_categorical('lags', [7, 14]),
        'max_iter'      : trial.suggest_int('max_iter', 100, 500),
        'max_depth'     : trial.suggest_int('max_depth', 5, 10),
        'learning_rate' : trial.suggest_float('learning_rate', 0.01, 0.1)
    } 

    return search_space

forecaster_ms = ForecasterAutoregMultiSeries(
                    regressor          = HistGradientBoostingRegressor(random_state=123),
                    lags               = 14,
                    transformer_series = StandardScaler(),
                    encoding           = 'ordinal'
                )

results_bayesian_ms = bayesian_search_forecaster_multiseries(
                        forecaster         = forecaster_ms,
                        series             = data.loc[:end_val, :],
                        levels             = None, # If None all levels are selected
                        search_space       = search_space,
                        n_trials           = 20,
                        steps              = 7,
                        metric             = 'mean_absolute_error',
                        initial_train_size = len(data_train),
                        refit              = False,
                        fixed_train_size   = False,
                        return_best        = True,
                        verbose            = False,
                        show_progress      = False 
                    )      

multi_series_mae, predictions_ms = backtesting_forecaster_multiseries(
                                       forecaster             = forecaster_ms,
                                       series                 = data,
                                       levels                 = None, # If None all levels are selected
                                       steps                  = 7,
                                       metric                 = 'mean_absolute_error',
                                        add_aggregated_metric = False,
                                       initial_train_size     = len(data_train) + len(data_val),
                                       refit                  = False,
                                       fixed_train_size       = False,
                                       verbose                = False
                                   )
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14] 
  Parameters: {'max_iter': 374, 'max_depth': 7, 'learning_rate': 0.04529057663747355}
  Backtesting metric: 8.24598165517355
  Levels: ['item_1', 'item_2', 'item_3', 'item_4', 'item_5', 'item_6', 'item_7', 'item_8', 'item_9', 'item_10', 'item_11', 'item_12', 'item_13', 'item_14', 'item_15', 'item_16', 'item_17', 'item_18', 'item_19', 'item_20', 'item_21', 'item_22', 'item_23', 'item_24', 'item_25', 'item_26', 'item_27', 'item_28', 'item_29', 'item_30', 'item_31', 'item_32', 'item_33', 'item_34', 'item_35', 'item_36', 'item_37', 'item_38', 'item_39', 'item_40', 'item_41', 'item_42', 'item_43', 'item_44', 'item_45', 'item_46', 'item_47', 'item_48', 'item_49', 'item_50']

In [27]:
# Difference in backtesting metric for each item
# ======================================================================================
multi_series_mae = multi_series_mae.set_index('levels')
multi_series_mae.columns = ['multi_series_mae']
results = pd.concat((uni_series_mae, multi_series_mae), axis = 1)
results['improvement'] = results.eval('uni_series_mae - multi_series_mae')
results['improvement_(%)'] = 100 * results.eval('(uni_series_mae - multi_series_mae) / uni_series_mae')
results = results.round(2)

# Average improvement for all items
# ======================================================================================
results[['improvement', 'improvement_(%)']].agg(['mean', 'min', 'max'])
Out[27]:
improvement improvement_(%)
mean 0.54 5.8222
min 0.04 0.5400
max 1.14 10.2700
In [28]:
# Number of series with positive and negative improvement
# ======================================================================================
pd.Series(np.where(results['improvement_(%)'] < 0, 'negative', 'positive')).value_counts()
Out[28]:
positive    50
Name: count, dtype: int64

After identifying the combination of lags and hyperparameters that achieve the best predictive performance for each forecaster, more single-series models have achieved higher predictive ability by better generalizing their own data (one item). Even so, the multi-series model provides better results for most of the items.

Feature Selection

Feature selection is the process of selecting a subset of relevant features (variables, predictors) for use in model construction. Feature selection techniques are used for several reasons: to simplify models to make them easier to interpret, to reduce training time, to avoid the curse of dimensionality, to improve generalization by reducing overfitting (formally, variance reduction), and others.

Skforecast is compatible with the feature selection methods implemented in the scikit-learn library. There are several methods for feature selection, but the most common are:

  • Recursive feature elimination (RFE)

  • Sequential Feature Selection (SFS)

  • Feature selection based on threshold (SelectFromModel)

💡 Tip

Feature selection is a powerful tool for improving the performance of machine learning models. However, it is computationally expensive and can be time-consuming. Since the goal is to find the best subset of features, not the best model, it is not necessary to use the entire data set or a highly complex model. Instead, it is recommended to use a small subset of the data and a simple model. Once the best subset of features has been identified, the model can then be trained using the entire dataset and a more complex configuration.

Weights in multi-series

The weights are used to control the influence that each observation has on the training of the model. ForecasterAutoregMultiseries accepts two types of weights:

  • series_weights controls the relative importance of each series. If a series has twice as much weight as the others, the observations of that series influence the training twice as much. The higher the weight of a series relative to the others, the more the model will focus on trying to learn that series.

  • weight_func controls the relative importance of each observation according to its index value. For example, a function that assigns a lower weight to certain dates.

If the two types of weights are indicated, they are multiplied to create the final weights as shown in the figure. The resulting sample_weight cannot have negative values.

Learn more about weights in multi-series forecasting and weighted time series forecasting with skforecast.

In this example, item_1 has higher relative importance among series (it weighs 3 times more than the rest of the series), and observations between '2013-12-01' and '2014-01-31' are considered non-representative and a weight of 0 is applied to them.

In [ ]:
# Weights in ForecasterAutoregMultiSeries
# ======================================================================================

# Weights for each series
series_weights = {'item_1': 3.0} # Series not presented in the dict will have weight 1

# Weights for each index
def custom_weights(index):
    """
    Return 0 if index is between '2013-12-01' and '2014-01-31', 1 otherwise.
    """
    weights = np.where(
                  (index >= '2013-12-01') & (index <= '2014-01-31'),
                   0,
                   1
              )
    
    return weights

forecaster = ForecasterAutoregMultiSeries(
                 regressor          = HistGradientBoostingRegressor(random_state=123),
                 lags               = 14,
                 transformer_series = StandardScaler(),
                 encoding           = 'ordinal',
                 transformer_exog   = None,
                 weight_func        = custom_weights, 
                 series_weights     = series_weights
             )

forecaster.fit(series=data)
forecaster.predict(steps=7).head(3)
/home/ubuntu/anaconda3/envs/skforecast_13_py12/lib/python3.12/site-packages/skforecast/ForecasterAutoregMultiSeries/ForecasterAutoregMultiSeries.py:1015: IgnoredArgumentWarning: {'item_24', 'item_7', 'item_47', 'item_28', 'item_15', 'item_41', 'item_6', 'item_10', 'item_46', 'item_9', 'item_48', 'item_39', 'item_37', 'item_33', 'item_45', 'item_4', 'item_36', 'item_44', 'item_34', 'item_32', 'item_27', 'item_26', 'item_11', 'item_8', 'item_29', 'item_2', 'item_30', 'item_3', 'item_40', 'item_12', 'item_22', 'item_50', 'item_13', 'item_23', 'item_35', 'item_18', 'item_5', 'item_25', 'item_42', 'item_31', 'item_43', 'item_19', 'item_14', 'item_17', 'item_21', 'item_38', 'item_20', 'item_16', 'item_49'} not present in `series_weights`. A weight of 1 is given to all their samples. 
 You can suppress this warning using: warnings.simplefilter('ignore', category=IgnoredArgumentWarning)
  warnings.warn(
Out[ ]:
item_1 item_10 item_11 item_12 item_13 item_14 item_15 item_16 item_17 item_18 ... item_46 item_47 item_48 item_49 item_5 item_50 item_6 item_7 item_8 item_9
2018-01-01 20.583876 61.580282 62.323468 62.581761 79.205140 56.450206 85.846570 22.718565 32.502652 79.377183 ... 53.421299 22.260726 47.144927 28.682841 22.372861 63.199237 53.271716 57.232493 68.655821 41.978625
2018-01-02 22.501538 71.421524 75.349079 75.283254 89.691063 59.121550 90.118244 24.848422 36.265055 87.467801 ... 62.983381 26.943935 51.961949 34.340793 19.502197 63.945234 57.812538 61.734525 82.100161 49.326013
2018-01-03 20.907183 73.185299 76.323151 75.547258 83.861397 63.508950 95.531382 25.331899 33.468072 90.110389 ... 62.890533 25.834892 52.942216 32.730445 18.498621 71.040223 61.123112 63.709698 84.395482 50.310361

3 rows × 50 columns

🖉 Note

A dictionary can be passed to `weight_func` to apply different functions for each series. If a series is not presented in the dictionary, it will have weight 1.

Conclusions

This use case shows that a multi-series model may have advantages over multiple individual models when forecasting time series that follow similar dynamics. Beyond the potential improvements in forecasting, it is also important to take into consideration the benefit of having only one model to maintain.

Session information

In [ ]:
import session_info
session_info.show(html=False)

Citation

How to cite this document

If you use this document or any part of it, please acknowledge the source, thank you!

Global Forecasting Models: Modeling Multiple Time Series with Machine Learning by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py44-multi-series-forecasting-skforecast.html

How to cite skforecast

If you use skforecast for a scientific publication, we would appreciate it if you cite the published software.

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.13.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.13.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.13.0}, month = {8}, year = {2024}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }


Did you like the article? Your support is important

Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊


Creative Commons Licence
This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.