Forecasting time series with XGBoost

If you like  Skforecast ,  please give us a star on   GitHub! ⭐️

Forecasting time series with XGBoost

Joaquín Amat Rodrigo, Javier Escobar Ortiz
June, 2024 (last update August 2024)

Introduction

Gradient boosting models have gained popularity in the machine learning community due to their ability to achieve excellent results in a wide range of use cases, including both regression and classification. Although these models have traditionally been less common in forecasting, they can be highly effective in this domain. Some of the key benefits of using gradient boosting models for forecasting include:

  • The ease with which exogenous variables can be included in the model, in addition to autoregressive variables.

  • The ability to capture non-linear relationships between variables.

  • High scalability, allowing models to handle large volumes of data.

  • Some implementations allow the inclusion of categorical variables without the need for additional encoding, such as one-hot encoding.

Despite these benefits, the use of machine learning models for forecasting can present several challenges that can make analysts reluctant to use them, the main ones being:

  • Transforming the data so that it can be used as a regression problem.

  • Depending on how many future predictions are needed (prediction horizon), an iterative process may be required where each new prediction is based on previous ones.

  • Model validation requires specific strategies such as backtesting, walk-forward validation or time series cross-validation. Traditional cross-validation cannot be used.

The skforecast library provides automated solutions to these challenges, making it easier to apply and validate machine learning models to forecasting problems. The library supports several advanced gradient boosting models, including XGBoost.

✎ Note

This document is a summary of a more comprehensive guide to using gradient boosting models for time series forecasting. The complete guide is available at Forecasting time series with gradient boosting: Skforecast, XGBoost, LightGBM, Scikit-learn and CatBoost.

Libraries

Libraries used in this document.

In [1]:
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)
plt.style.use('seaborn-v0_8-darkgrid')

# Modelling and Forecasting
# ==============================================================================
import xgboost
import skforecast
import sklearn
from xgboost import XGBRegressor
from sklearn.feature_selection import RFECV
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import bayesian_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from skforecast.model_selection import select_features
import shap

# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version xgboost: {xgboost.__version__}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")
Version skforecast: 0.13.0
Version scikit-learn: 1.5.1
Version xgboost: 2.1.0
Version pandas: 2.2.2
Version numpy: 2.0.1

Data

The data in this document represent the hourly usage of the bike share system in the city of Washington, D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. The original data was obtained from the UCI Machine Learning Repository.

In [2]:
# Downloading data
# ==============================================================================
data = fetch_dataset('bike_sharing_extended_features')
data.head(4)
bike_sharing_extended_features
------------------------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, the dataset
was enriched by introducing supplementary features. Addition includes calendar-
based variables (day of the week, hour of the day, month, etc.), indicators for
sunlight, incorporation of rolling temperature averages, and the creation of
polynomial features generated from variable pairs. All cyclic variables are
encoded using sine and cosine functions to ensure accurate representation.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17352, 90)
Out[2]:
users weather month_sin month_cos week_of_year_sin week_of_year_cos week_day_sin week_day_cos hour_day_sin hour_day_cos ... temp_roll_mean_1_day temp_roll_mean_7_day temp_roll_max_1_day temp_roll_min_1_day temp_roll_max_7_day temp_roll_min_7_day holiday_previous_day holiday_next_day temp holiday
date_time
2011-01-08 00:00:00 25.0 mist 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.258819 0.965926 ... 8.063334 10.127976 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0
2011-01-08 01:00:00 16.0 mist 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.500000 0.866025 ... 8.029166 10.113334 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0
2011-01-08 02:00:00 16.0 mist 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.707107 0.707107 ... 7.995000 10.103572 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0
2011-01-08 03:00:00 7.0 rain 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.866025 0.500000 ... 7.960833 10.093809 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0

4 rows × 90 columns

To facilitate the training of the models, the search for optimal hyperparameters and the evaluation of their predictive accuracy, the data are divided into three separate sets: training, validation and test.

In [3]:
# Split train-validation-test
# ==============================================================================
end_train = '2012-03-31 23:59:00'
end_validation = '2012-08-31 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")
Dates train      : 2011-01-08 00:00:00 --- 2012-03-31 23:00:00  (n=10776)
Dates validacion : 2012-04-01 00:00:00 --- 2012-08-31 23:00:00  (n=3672)
Dates test       : 2012-09-01 00:00:00 --- 2012-12-30 23:00:00  (n=2904)

Data exploration

Graphical exploration of time series can be an effective way of identifying trends, patterns, and seasonal variations. This, in turn, helps to guide the selection of the most appropriate forecasting model.

Plot time series

Full time series

In [4]:
# Interactive plot of time series
# ==============================================================================
fig = go.Figure()
fig.add_trace(go.Scatter(x=data_train.index, y=data_train['users'], mode='lines', name='Train'))
fig.add_trace(go.Scatter(x=data_val.index, y=data_val['users'], mode='lines', name='Validation'))
fig.add_trace(go.Scatter(x=data_test.index, y=data_test['users'], mode='lines', name='Test'))
fig.update_layout(
    title  = 'Number of users',
    xaxis_title="Time",
    yaxis_title="Users",
    legend_title="Partition:",
    width=800,
    height=350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1,
        xanchor="left",
        x=0.001
    )
)
#fig.update_xaxes(rangeslider_visible=True)
fig.show()

Seasonality plots

Seasonal plots are a useful tool for identifying seasonal patterns and trends in a time series. They are created by averaging the values of each season over time and then plotting them against time.

In [5]:
# Annual, weekly and daily seasonality
# ==============================================================================
fig, axs = plt.subplots(2, 2, figsize=(8.5, 5.5), sharex=False, sharey=True)
axs = axs.ravel()

# Users distribution by month
data['month'] = data.index.month
data.boxplot(column='users', by='month', ax=axs[0])
data.groupby('month')['users'].median().plot(style='o-', linewidth=0.8, ax=axs[0])
axs[0].set_ylabel('Users')
axs[0].set_title('Users distribution by month', fontsize=10)

# Users distribution by week day
data['week_day'] = data.index.day_of_week + 1
data.boxplot(column='users', by='week_day', ax=axs[1])
data.groupby('week_day')['users'].median().plot(style='o-', linewidth=0.8, ax=axs[1])
axs[1].set_ylabel('Users')
axs[1].set_title('Users distribution by week day', fontsize=10)

# Users distribution by the hour of the day
data['hour_day'] = data.index.hour + 1
data.boxplot(column='users', by='hour_day', ax=axs[2])
data.groupby('hour_day')['users'].median().plot(style='o-', linewidth=0.8, ax=axs[2])
axs[2].set_ylabel('Users')
axs[2].set_title('Users distribution by the hour of the day', fontsize=10)

# Users distribution by week day and hour of the day
mean_day_hour = data.groupby(["week_day", "hour_day"])["users"].mean()
mean_day_hour.plot(ax=axs[3])
axs[3].set(
    title       = "Mean users during week",
    xticks      = [i * 24 for i in range(7)],
    xticklabels = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"],
    xlabel      = "Day and hour",
    ylabel      = "Number of users"
)
axs[3].title.set_size(10)

fig.suptitle("Seasonality plots", fontsize=14)
fig.tight_layout()

There is a clear difference between weekdays and weekends. There is also a clear intra-day pattern, with a different influx of users depending on the time of day.

Autocorrelation plots

Auto-correlation plots are a useful tool for identifying the order of an autoregressive model. The autocorrelation function (ACF) is a measure of the correlation between the time series and a lagged version of itself. The partial autocorrelation function (PACF) is a measure of the correlation between the time series and a lagged version of itself, controlling for the values of the time series at all shorter lags. These plots are useful for identifying the lags to be included in the autoregressive model.

In [6]:
# Autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2))
plot_acf(data['users'], ax=ax, lags=72)
plt.show()
In [7]:
# Partial autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2))
plot_pacf(data['users'], ax=ax, lags=72, method='ywm')
plt.show()

The results of the autocorrelation study show that there is a significant correlation between the number of users in previous hours, as well as the days before, and the number of users in the future. This means that knowing the number of users during certain periods in the past could be valuable in predicting the number of users in the future.

Recursive multi-step forecasting with XGBoost

First, an ForecasterAutoreg model is trained using past values (lags) of the response variable as predictors. Later, exogenous variables are added to the model and the improvement in its performance is assessed. Since Gradient Boosting models have a large number of hyperparameters, a Bayesian Search is performed using the bayesian_search_forecaster() function to find the best combination of hyperparameters and lags. Finally, the predictive ability of the model is evaluated using a backtesting process.

Forecaster

In [8]:
# Create forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
                 regressor = XGBRegressor(random_state=15926, enable_categorical=True),
                 lags      = 24
             )

# Train forecaster
# ==============================================================================
forecaster.fit(y=data.loc[:end_validation, 'users'])
forecaster
Out[8]:
================= 
ForecasterAutoreg 
================= 
Regressor: XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=None, device=None, early_stopping_rounds=None,
             enable_categorical=True, eval_metric=None, feature_types=None,
             gamma=None, grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=None, max_bin=None,
             max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=None, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             multi_strategy=None, n_estimators=None, n_jobs=None,
             num_parallel_tree=None, random_state=15926, ...) 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24] 
Transformer for y: None 
Transformer for exog: None 
Window size: 24 
Weight function included: False 
Differentiation order: None 
Exogenous included: False 
Exogenous variables names: None 
Training range: [Timestamp('2011-01-08 00:00:00'), Timestamp('2012-08-31 23:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: h 
Regressor parameters: {'objective': 'reg:squarederror', 'base_score': None, 'booster': None, 'callbacks': None, 'colsample_bylevel': None, 'colsample_bynode': None, 'colsample_bytree': None, 'device': None, 'early_stopping_rounds': None, 'enable_categorical': True, 'eval_metric': None, 'feature_types': None, 'gamma': None, 'grow_policy': None, 'importance_type': None, 'interaction_constraints': None, 'learning_rate': None, 'max_bin': None, 'max_cat_threshold': None, 'max_cat_to_onehot': None, 'max_delta_step': None, 'max_depth': None, 'max_leaves': None, 'min_child_weight': None, 'missing': nan, 'monotone_constraints': None, 'multi_strategy': None, 'n_estimators': None, 'n_jobs': None, 'num_parallel_tree': None, 'random_state': 15926, 'reg_alpha': None, 'reg_lambda': None, 'sampling_method': None, 'scale_pos_weight': None, 'subsample': None, 'tree_method': None, 'validate_parameters': None, 'verbosity': None} 
fit_kwargs: {} 
Creation date: 2024-08-10 18:43:10 
Last fit date: 2024-08-10 18:43:16 
Skforecast version: 0.13.0 
Python version: 3.12.4 
Forecaster id: None 
In [9]:
# Predict
# ==============================================================================
forecaster.predict(steps=10)
Out[9]:
2012-09-01 00:00:00    123.423668
2012-09-01 01:00:00     75.687805
2012-09-01 02:00:00     43.349522
2012-09-01 03:00:00     15.084871
2012-09-01 04:00:00      3.778617
2012-09-01 05:00:00     13.915284
2012-09-01 06:00:00     44.951176
2012-09-01 07:00:00    132.655685
2012-09-01 08:00:00    298.680359
2012-09-01 09:00:00    435.629669
Freq: h, Name: pred, dtype: float64

Backtesting

In order to have a robust estimate of the predictive ability of the model, a backtesting process is carried out. The backtesting process consists of generating a forecast for each observation in the test set, following the same procedure as it would be done in production, and then comparing the predicted value with the actual value.

In [10]:
# Backtest model on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
                          forecaster         = forecaster,
                          y                  = data['users'],
                          steps              = 36,
                          metric             = 'mean_absolute_error',
                          initial_train_size = len(data[:end_validation]),
                          refit              = False,
                          n_jobs             = 'auto',
                          verbose            = False, # Change to False to see less information
                          show_progress      = True
                      )
predictions.head()
Out[10]:
pred
2012-09-01 00:00:00 123.423668
2012-09-01 01:00:00 75.687805
2012-09-01 02:00:00 43.349522
2012-09-01 03:00:00 15.084871
2012-09-01 04:00:00 3.778617
In [11]:
# Backtesting error
# ==============================================================================
metric
Out[11]:
mean_absolute_error
0 73.466462

Exogenous Variables

So far, only lagged values of the time series have been used as predictors. However, it is possible to include other variables as predictors. These variables are known as exogenous variables (features) and their use can improve the predictive capacity of the model. A very important point to keep in mind is that the values of the exogenous variables must be known at the time of prediction.

Common examples of exogenous variables are those derived from the calendar, such as the day of the week, month, year, or holidays. Weather variables such as temperature, humidity, and wind also fall into this category, as do economic variables such as inflation and interest rates.

Warning

Exogenous variables must be known at the time of the forecast. For example, if temperature is used as an exogenous variable, the temperature value for the next hour must be known at the time of the forecast. If the temperature value is not known, the forecast will not be possible.

Weather variables should be used with caution. When the model is deployed into production, future weather conditions are not known, but are predictions made by meteorological services. Because they are predictions, they introduce errors into the forecasting model. As a result, the model's predictions are likely to get worse. One way to anticipate this problem, and to know (not avoid) the expected performance of the model, is to use the weather forecasts available at the time the model is trained, rather than the recorded conditions.

✎ Note

For a more detailed explanation of how to use exogenous variables, such as categorical features, visit:
In [12]:
# Variables included as exogenous features
# ==============================================================================
exog_features = [
    'month_sin', 
    'month_cos',
    'week_of_year_sin',
    'week_of_year_cos',
    'week_day_sin',
    'week_day_cos',
    'hour_day_sin',
    'hour_day_cos',
    'sunrise_hour_sin',
    'sunrise_hour_cos',
    'sunset_hour_sin',
    'sunset_hour_cos',
    'holiday_previous_day',
    'holiday_next_day',
    'temp_roll_mean_1_day',
    'temp_roll_mean_7_day',
    'temp_roll_max_1_day',
    'temp_roll_min_1_day',
    'temp_roll_max_7_day',
    'temp_roll_min_7_day',
    'temp',
    'holiday'
]
In [13]:
# Backtesting model with exogenous variables on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
    forecaster         = forecaster,
    y                  = data['users'],
    exog               = data[exog_features],
    steps              = 36,
    metric             = 'mean_absolute_error',
    initial_train_size = len(data[:end_validation]),
    refit              = False,
    n_jobs             = 'auto',
    verbose            = False,
    show_progress      = True
)
metric
Out[13]:
mean_absolute_error
0 62.119768

Adding exogenous variables to the model improves its predictive capacity.

Hyperparameter tuning

Hyperparameter tuning is a critical step in developing effective machine learning models. The ForecasterAutoreg used in the previous sections included a XGBoost model with the default hyperparameters. However, there is no reason why these values are the most appropriate. To find the best hyperparameters, a Bayesian Search is performed using the bayesian_search_forecaster() function. The search is carried out using the same backtesting process as before, but each time, the model is trained with different combinations of hyperparameters and lags. It is important to note that the hyperparameter search must be done using the validation set, so the test data is never used.

The search is performed by testing each combination of hyperparameters and lags as follows:

  1. Train the model using only the training set.

  2. The model is evaluated using the validation set via backtesting.

  3. Select the combination of hyperparameters and lags that gives the lowest error.

  4. Train the model again using the best combination found, this time using both the training and validation data.

By following these steps, one can obtain a model with optimized hyperparameters and avoid overfitting.

✎ Note

Starting at version 0.12.0, lags are included in the search space (search_space).
In [14]:
# Hyperparameters search
# ==============================================================================
# Create forecaster
forecaster = ForecasterAutoreg(
    regressor = XGBRegressor(random_state=15926, enable_categorical=True),
    lags      = 72
)

# Lags grid
lags_grid = [48, 72, [1, 2, 3, 23, 24, 25, 167, 168, 169]]

# Regressor hyperparameters search space
def search_space(trial):
    search_space  = {
        'n_estimators'    : trial.suggest_int('n_estimators', 400, 1200, step=100),
        'max_depth'       : trial.suggest_int('max_depth', 3, 10, step=1),
        'learning_rate'   : trial.suggest_float('learning_rate', 0.01, 1),
        'subsample'       : trial.suggest_float('subsample', 0.1, 1),
        'colsample_bytree': trial.suggest_float('colsample_bytree', 0.1, 1),
        'gamma'           : trial.suggest_float('gamma', 0, 1),
        'reg_alpha'       : trial.suggest_float('reg_alpha', 0, 1),
        'reg_lambda'      : trial.suggest_float('reg_lambda', 0, 1),
        'lags'            : trial.suggest_categorical('lags', lags_grid)
    } 
    return search_space

results_search, frozen_trial = bayesian_search_forecaster(
    forecaster         = forecaster,
    y                  = data.loc[:end_validation, 'users'],
    exog               = data.loc[:end_validation, exog_features],
    search_space       = search_space,
    steps              = 36,
    refit              = False,
    metric             = 'mean_absolute_error',
    initial_train_size = len(data_train),
    fixed_train_size   = False,
    n_trials           = 20,
    random_state       = 123,
    return_best        = True,
    n_jobs             = 'auto',
    verbose            = False,
    show_progress      = True
)
best_params = results_search['params'].iat[0]
best_lags = results_search['lags'].iat[0]
`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [  1   2   3  23  24  25 167 168 169] 
  Parameters: {'n_estimators': 400, 'max_depth': 10, 'learning_rate': 0.1987029596852645, 'subsample': 0.8452581056448839, 'colsample_bytree': 0.6490537930286489, 'gamma': 0.42743952450393, 'reg_alpha': 0.7787952214581544, 'reg_lambda': 0.17127802671184533}
  Backtesting metric: 64.3772965454393

In [15]:
# Search results
# ==============================================================================
results_search.head(5)
Out[15]:
lags params mean_absolute_error n_estimators max_depth learning_rate subsample colsample_bytree gamma reg_alpha reg_lambda
17 [1, 2, 3, 23, 24, 25, 167, 168, 169] {'n_estimators': 400, 'max_depth': 10, 'learni... 64.377297 400.0 10.0 0.198703 0.845258 0.649054 0.427440 0.778795 0.171278
19 [1, 2, 3, 23, 24, 25, 167, 168, 169] {'n_estimators': 500, 'max_depth': 9, 'learnin... 67.255268 500.0 9.0 0.158141 0.874604 0.644145 0.456653 0.748756 0.200001
15 [1, 2, 3, 23, 24, 25, 167, 168, 169] {'n_estimators': 400, 'max_depth': 9, 'learnin... 68.130566 400.0 9.0 0.016108 0.724424 0.663652 0.133031 0.808592 0.151677
16 [1, 2, 3, 23, 24, 25, 167, 168, 169] {'n_estimators': 400, 'max_depth': 10, 'learni... 68.204207 400.0 10.0 0.163782 0.764628 0.634598 0.402922 0.788758 0.014016
13 [1, 2, 3, 23, 24, 25, 167, 168, 169] {'n_estimators': 900, 'max_depth': 8, 'learnin... 69.459926 900.0 8.0 0.012168 0.748844 0.775209 0.085690 0.899431 0.012284

Since return_best has been set to True, the forecaster object is automatically updated with the best configuration found and trained on the entire dataset. This final model can then be used for future predictions on new data.

In [16]:
# Best model
# ==============================================================================
forecaster
Out[16]:
================= 
ForecasterAutoreg 
================= 
Regressor: XGBRegressor(base_score=None, booster=None, callbacks=None,
             colsample_bylevel=None, colsample_bynode=None,
             colsample_bytree=0.6490537930286489, device=None,
             early_stopping_rounds=None, enable_categorical=True,
             eval_metric=None, feature_types=None, gamma=0.42743952450393,
             grow_policy=None, importance_type=None,
             interaction_constraints=None, learning_rate=0.1987029596852645,
             max_bin=None, max_cat_threshold=None, max_cat_to_onehot=None,
             max_delta_step=None, max_depth=10, max_leaves=None,
             min_child_weight=None, missing=nan, monotone_constraints=None,
             multi_strategy=None, n_estimators=400, n_jobs=None,
             num_parallel_tree=None, random_state=15926, ...) 
Lags: [  1   2   3  23  24  25 167 168 169] 
Transformer for y: None 
Transformer for exog: None 
Window size: 169 
Weight function included: False 
Differentiation order: None 
Exogenous included: True 
Exogenous variables names: ['month_sin', 'month_cos', 'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'sunrise_hour_sin', 'sunrise_hour_cos', 'sunset_hour_sin', 'sunset_hour_cos', 'holiday_previous_day', 'holiday_next_day', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'temp_roll_max_7_day', 'temp_roll_min_7_day', 'temp', 'holiday'] 
Training range: [Timestamp('2011-01-08 00:00:00'), Timestamp('2012-08-31 23:00:00')] 
Training index type: DatetimeIndex 
Training index frequency: h 
Regressor parameters: {'objective': 'reg:squarederror', 'base_score': None, 'booster': None, 'callbacks': None, 'colsample_bylevel': None, 'colsample_bynode': None, 'colsample_bytree': 0.6490537930286489, 'device': None, 'early_stopping_rounds': None, 'enable_categorical': True, 'eval_metric': None, 'feature_types': None, 'gamma': 0.42743952450393, 'grow_policy': None, 'importance_type': None, 'interaction_constraints': None, 'learning_rate': 0.1987029596852645, 'max_bin': None, 'max_cat_threshold': None, 'max_cat_to_onehot': None, 'max_delta_step': None, 'max_depth': 10, 'max_leaves': None, 'min_child_weight': None, 'missing': nan, 'monotone_constraints': None, 'multi_strategy': None, 'n_estimators': 400, 'n_jobs': None, 'num_parallel_tree': None, 'random_state': 15926, 'reg_alpha': 0.7787952214581544, 'reg_lambda': 0.17127802671184533, 'sampling_method': None, 'scale_pos_weight': None, 'subsample': 0.8452581056448839, 'tree_method': None, 'validate_parameters': None, 'verbosity': None} 
fit_kwargs: {} 
Creation date: 2024-08-10 18:43:23 
Last fit date: 2024-08-10 18:45:17 
Skforecast version: 0.13.0 
Python version: 3.12.4 
Forecaster id: None 

Once the best combination of hyperparameters has been identified using the validation data, the predictive capacity of the model is evaluated when applied to the test set. It is highly recommended to review the documentation for the backtesting_forecaster() function to gain a better understanding of its capabilities. This will help to utilize its full potential to analyze the predictive ability of the model.

In [17]:
# Backtesting model with exogenous variables on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
    forecaster         = forecaster,
    y                  = data['users'],
    exog               = data[exog_features],
    steps              = 36,
    metric             = 'mean_absolute_error',
    initial_train_size = len(data[:end_validation]),
    refit              = False,
    n_jobs             = 'auto',
    verbose            = False,
    show_progress      = True
)
metric
Out[17]:
mean_absolute_error
0 62.620052
In [18]:
# Plot predictions vs real value
# ======================================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['users'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['pred'], name="prediction", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
    title="Real value vs predicted in test data",
    xaxis_title="Date time",
    yaxis_title="Users",
    width=800,
    height=350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.1,
        xanchor="left",
        x=0.001
    )
)
fig.show()