Forecasting time series with LightGBM

Introduction

Gradient boosting models have gained popularity in the machine learning community due to their ability to achieve excellent results in a wide range of use cases, including both regression and classification. Although these models have traditionally been less common in forecasting, they can be highly effective in this domain. Some of the key benefits of using gradient boosting models for forecasting include:

The ease with which exogenous variables can be included in the model, in addition to autoregressive variables.
The ability to capture non-linear relationships between variables.
High scalability, allowing models to handle large volumes of data.
Some implementations allow the inclusion of categorical variables without the need for additional encoding, such as one-hot encoding.

Despite these benefits, the use of machine learning models for forecasting can present several challenges that can make analysts reluctant to use them, the main ones being:

Transforming the data so that it can be used as a regression problem.
Depending on how many future predictions are needed (prediction horizon), an iterative process may be required where each new prediction is based on previous ones.
Model validation requires specific strategies such as backtesting, walk-forward validation or time series cross-validation. Traditional cross-validation cannot be used.

The skforecast library provides automated solutions to these challenges, making it easier to apply and validate machine learning models to forecasting problems. The library supports several advanced gradient boosting models, including LightGBM.

✎ Note

This document is a summary of a more comprehensive guide to using gradient boosting models for time series forecasting. The complete guide is available at Forecasting time series with gradient boosting: Skforecast, XGBoost, LightGBM, Scikit-learn and CatBoost.

Libraries

Libraries used in this document.

# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
from statsmodels.graphics.tsaplots import plot_acf
from statsmodels.graphics.tsaplots import plot_pacf
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)
plt.style.use('seaborn-v0_8-darkgrid')

# Modelling and Forecasting
# ==============================================================================
import lightgbm
import skforecast
import sklearn
from lightgbm import LGBMRegressor
from sklearn.feature_selection import RFECV
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import bayesian_search_forecaster
from skforecast.model_selection import backtesting_forecaster
from skforecast.feature_selection import select_features
from skforecast.preprocessing import RollingFeatures
import shap

# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version lightgbm: {lightgbm.__version__}")

Version skforecast: 0.16.0
Version scikit-learn: 1.6.1
Version lightgbm: 4.6.0

Data

The data in this document represent the hourly usage of the bike share system in the city of Washington, D.C. during the years 2011 and 2012. In addition to the number of users per hour, information about weather conditions and holidays is available. The original data was obtained from the UCI Machine Learning Repository.

# Downloading data
# ==============================================================================
data = fetch_dataset('bike_sharing_extended_features')
data.head(4)

bike_sharing_extended_features
------------------------------
Hourly usage of the bike share system in the city of Washington D.C. during the
years 2011 and 2012. In addition to the number of users per hour, the dataset
was enriched by introducing supplementary features. Addition includes calendar-
based variables (day of the week, hour of the day, month, etc.), indicators for
sunlight, incorporation of rolling temperature averages, and the creation of
polynomial features generated from variable pairs. All cyclic variables are
encoded using sine and cosine functions to ensure accurate representation.
Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.
https://doi.org/10.24432/C5W894.
Shape of the dataset: (17352, 90)

	users	weather	month_sin	month_cos	week_of_year_sin	week_of_year_cos	week_day_sin	week_day_cos	hour_day_sin	hour_day_cos	...	temp_roll_mean_1_day	temp_roll_mean_7_day	temp_roll_max_1_day	temp_roll_min_1_day	temp_roll_max_7_day	temp_roll_min_7_day	holiday_previous_day	holiday_next_day	temp	holiday
date_time
2011-01-08 00:00:00	25.0	mist	0.5	0.866025	0.120537	0.992709	-0.781832	0.62349	0.258819	0.965926	...	8.063334	10.127976	9.02	6.56	18.86	4.92	0.0	0.0	7.38	0.0
2011-01-08 01:00:00	16.0	mist	0.5	0.866025	0.120537	0.992709	-0.781832	0.62349	0.500000	0.866025	...	8.029166	10.113334	9.02	6.56	18.86	4.92	0.0	0.0	7.38	0.0
2011-01-08 02:00:00	16.0	mist	0.5	0.866025	0.120537	0.992709	-0.781832	0.62349	0.707107	0.707107	...	7.995000	10.103572	9.02	6.56	18.86	4.92	0.0	0.0	7.38	0.0
2011-01-08 03:00:00	7.0	rain	0.5	0.866025	0.120537	0.992709	-0.781832	0.62349	0.866025	0.500000	...	7.960833	10.093809	9.02	6.56	18.86	4.92	0.0	0.0	7.38	0.0

4 rows × 90 columns

To facilitate the training of the models, the search for optimal hyperparameters and the evaluation of their predictive accuracy, the data are divided into three separate sets: training, validation and test.

# Split train-validation-test
# ==============================================================================
end_train = '2012-03-31 23:59:00'
end_validation = '2012-08-31 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")

Dates train      : 2011-01-08 00:00:00 --- 2012-03-31 23:00:00  (n=10776)
Dates validacion : 2012-04-01 00:00:00 --- 2012-08-31 23:00:00  (n=3672)
Dates test       : 2012-09-01 00:00:00 --- 2012-12-30 23:00:00  (n=2904)

Data exploration

Graphical exploration of time series can be an effective way of identifying trends, patterns, and seasonal variations. This, in turn, helps to guide the selection of the most appropriate forecasting model.

Plot time series

Full time series

# Interactive plot of time series
# ==============================================================================
fig = go.Figure()
fig.add_trace(go.Scatter(x=data_train.index, y=data_train['users'], mode='lines', name='Train'))
fig.add_trace(go.Scatter(x=data_val.index, y=data_val['users'], mode='lines', name='Validation'))
fig.add_trace(go.Scatter(x=data_test.index, y=data_test['users'], mode='lines', name='Test'))
fig.update_layout(
    title  = 'Number of users',
    xaxis_title="Time",
    yaxis_title="Users",
    legend_title="Partition:",
    width=800,
    height=400,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(orientation="h", yanchor="top", y=1, xanchor="left", x=0.001)
)
#fig.update_xaxes(rangeslider_visible=True)
fig.show()

Seasonality plots

Seasonal plots are a useful tool for identifying seasonal patterns and trends in a time series. They are created by averaging the values of each season over time and then plotting them against time.

# Annual, weekly and daily seasonality
# ==============================================================================
fig, axs = plt.subplots(2, 2, figsize=(8.5, 5.5), sharex=False, sharey=True)
axs = axs.ravel()

# Users distribution by month
data['month'] = data.index.month
data.boxplot(column='users', by='month', ax=axs[0])
data.groupby('month')['users'].median().plot(style='o-', linewidth=0.8, ax=axs[0])
axs[0].set_ylabel('Users')
axs[0].set_title('Users distribution by month', fontsize=10)

# Users distribution by week day
data['week_day'] = data.index.day_of_week + 1
data.boxplot(column='users', by='week_day', ax=axs[1])
data.groupby('week_day')['users'].median().plot(style='o-', linewidth=0.8, ax=axs[1])
axs[1].set_ylabel('Users')
axs[1].set_title('Users distribution by week day', fontsize=10)

# Users distribution by the hour of the day
data['hour_day'] = data.index.hour + 1
data.boxplot(column='users', by='hour_day', ax=axs[2])
data.groupby('hour_day')['users'].median().plot(style='o-', linewidth=0.8, ax=axs[2])
axs[2].set_ylabel('Users')
axs[2].set_title('Users distribution by the hour of the day', fontsize=10)

# Users distribution by week day and hour of the day
mean_day_hour = data.groupby(["week_day", "hour_day"])["users"].mean()
mean_day_hour.plot(ax=axs[3])
axs[3].set(
    title       = "Mean users during week",
    xticks      = [i * 24 for i in range(7)],
    xticklabels = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"],
    xlabel      = "Day and hour",
    ylabel      = "Number of users"
)
axs[3].title.set_size(10)

fig.suptitle("Seasonality plots", fontsize=12)
fig.tight_layout()

There is a clear difference between weekdays and weekends. There is also a clear intra-day pattern, with a different influx of users depending on the time of day.

Autocorrelation plots

Auto-correlation plots are a useful tool for identifying the order of an autoregressive model. The autocorrelation function (ACF) is a measure of the correlation between the time series and a lagged version of itself. The partial autocorrelation function (PACF) is a measure of the correlation between the time series and a lagged version of itself, controlling for the values of the time series at all shorter lags. These plots are useful for identifying the lags to be included in the autoregressive model.

# Autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2))
plot_acf(data['users'], ax=ax, lags=72)
plt.show()

# Partial autocorrelation plot
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2))
plot_pacf(data['users'], ax=ax, lags=72, method='ywm')
plt.show()

The results of the autocorrelation study show that there is a significant correlation between the number of users in previous hours, as well as the days before, and the number of users in the future. This means that knowing the number of users during certain periods in the past could be valuable in predicting the number of users in the future.

Recursive multi-step forecasting with LightGBM

First, an ForecasterRecursive model is trained using past values (lags) of the response variable as predictors. Later, exogenous variables are added to the model and the improvement in its performance is assessed. Since Gradient Boosting models have a large number of hyperparameters, a Bayesian Search is performed using the bayesian_search_forecaster() function to find the best combination of hyperparameters and lags. Finally, the predictive ability of the model is evaluated using a backtesting process.

Forecaster

# Create forecaster
# ==============================================================================
window_features = RollingFeatures(stats=["mean"], window_sizes=24 * 3)
forecaster = ForecasterRecursive(
                regressor       = LGBMRegressor(random_state=15926, verbose=-1),
                lags            = 24,
                window_features = window_features
             )

# Train forecaster
# ==============================================================================
forecaster.fit(y=data.loc[:end_validation, 'users'])
forecaster

ForecasterRecursive

General Information

Regressor: LGBMRegressor
Lags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24]
Window features: ['roll_mean_72']
Window size: 72
Series name: users
Exogenous included: False
Weight function included: False
Differentiation order: None
Creation date: 2025-05-14 22:35:28
Last fit date: 2025-05-14 22:35:28
Skforecast version: 0.16.0
Python version: 3.12.9
Forecaster id: None

Exogenous Variables

None

Data Transformations

Transformer for y: None
Transformer for exog: None

Training Information

Training range: [Timestamp('2011-01-08 00:00:00'), Timestamp('2012-08-31 23:00:00')]
Training index type: DatetimeIndex
Training index frequency: h

Regressor Parameters

{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.1, 'max_depth': -1, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 100, 'n_jobs': None, 'num_leaves': 31, 'objective': None, 'random_state': 15926, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'verbose': -1}

Fit Kwargs

{}

🛈 API Reference 🗎 User Guide

# Predict
# ==============================================================================
forecaster.predict(steps=10)

2012-09-01 00:00:00    112.784440
2012-09-01 01:00:00     76.323329
2012-09-01 02:00:00     42.893397
2012-09-01 03:00:00     19.939556
2012-09-01 04:00:00      8.926545
2012-09-01 05:00:00     15.190465
2012-09-01 06:00:00     46.221618
2012-09-01 07:00:00    123.042022
2012-09-01 08:00:00    281.181111
2012-09-01 09:00:00    421.506136
Freq: h, Name: pred, dtype: float64

Backtesting

In order to have a robust estimate of the predictive ability of the model, a backtesting process is carried out. The backtesting process consists of generating a forecast for each observation in the test set, following the same procedure as it would be done in production, and then comparing the predicted value with the actual value.

# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
        steps              = 36,
        initial_train_size = len(data[:end_validation]),
        refit              = False,
)
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['users'],
                            cv         = cv,
                            metric     = 'mean_absolute_error'
                       )
predictions.head()

  0%|          | 0/81 [00:00<?, ?it/s]

	pred
2012-09-01 00:00:00	112.784440
2012-09-01 01:00:00	76.323329
2012-09-01 02:00:00	42.893397
2012-09-01 03:00:00	19.939556
2012-09-01 04:00:00	8.926545

# Backtesting error
# ==============================================================================
metric

	mean_absolute_error
0	76.362396

Exogenous Variables

So far, only lagged values of the time series have been used as predictors. However, it is possible to include other variables as predictors. These variables are known as exogenous variables (features) and their use can improve the predictive capacity of the model. A very important point to keep in mind is that the values of the exogenous variables must be known at the time of prediction.

Common examples of exogenous variables are those derived from the calendar, such as the day of the week, month, year, or holidays. Weather variables such as temperature, humidity, and wind also fall into this category, as do economic variables such as inflation and interest rates.

⚠ Warning

Exogenous variables must be known at the time of the forecast. For example, if temperature is used as an exogenous variable, the temperature value for the next hour must be known at the time of the forecast. If the temperature value is not known, the forecast will not be possible.

Weather variables should be used with caution. When the model is deployed into production, future weather conditions are not known, but are predictions made by meteorological services. Because they are predictions, they introduce errors into the forecasting model. As a result, the model's predictions are likely to get worse. One way to anticipate this problem, and to know (not avoid) the expected performance of the model, is to use the weather forecasts available at the time the model is trained, rather than the recorded conditions.

✎ Note

For a more detailed explanation of how to use exogenous variables, such as categorical features, visit:

# Variables included as exogenous features
# ==============================================================================
exog_features = [
    'month_sin', 
    'month_cos',
    'week_of_year_sin',
    'week_of_year_cos',
    'week_day_sin',
    'week_day_cos',
    'hour_day_sin',
    'hour_day_cos',
    'sunrise_hour_sin',
    'sunrise_hour_cos',
    'sunset_hour_sin',
    'sunset_hour_cos',
    'holiday_previous_day',
    'holiday_next_day',
    'temp_roll_mean_1_day',
    'temp_roll_mean_7_day',
    'temp_roll_max_1_day',
    'temp_roll_min_1_day',
    'temp_roll_max_7_day',
    'temp_roll_min_7_day',
    'temp',
    'holiday'
]

# Backtesting model with exogenous variables on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['users'],
                            exog       = data[exog_features],
                            cv         = cv,
                            metric     = 'mean_absolute_error'
                      )
metric

  0%|          | 0/81 [00:00<?, ?it/s]

	mean_absolute_error
0	58.892837

Adding exogenous variables to the model improves its predictive capacity.

Hyperparameter tuning

Hyperparameter tuning is a critical step in developing effective machine learning models. The ForecasterRecursive used in the previous sections included a LGMBRegressor model with the default hyperparameters. However, there is no reason why these values are the most appropriate. To find the best hyperparameters, a Bayesian Search is performed using the bayesian_search_forecaster() function. The search is carried out using the same backtesting process as before, but each time, the model is trained with different combinations of hyperparameters and lags. It is important to note that the hyperparameter search must be done using the validation set, so the test data is never used.

The search is performed by testing each combination of hyperparameters and lags as follows:

Train the model using only the training set.
The model is evaluated using the validation set via backtesting.
Select the combination of hyperparameters and lags that gives the lowest error.
Train the model again using the best combination found, this time using both the training and validation data.

By following these steps, one can obtain a model with optimized hyperparameters and avoid overfitting.

# Hyperparameters search
# ==============================================================================
# Create forecaster
forecaster = ForecasterRecursive(
                regressor       = LGBMRegressor(random_state=15926, verbose=-1),
                lags            = 24,
                window_features = window_features
            )


# Lags grid
lags_grid = [48, 72, [1, 2, 3, 23, 24, 25, 167, 168, 169]]

# Regressor hyperparameters search space
def search_space(trial):
    search_space  = {
        'n_estimators'    : trial.suggest_int('n_estimators', 400, 1200, step=100),
        'max_depth'       : trial.suggest_int('max_depth', 3, 10, step=1),
        'min_data_in_leaf': trial.suggest_int('min_data_in_leaf', 25, 500),
        'learning_rate'   : trial.suggest_float('learning_rate', 0.01, 0.5),
        'feature_fraction': trial.suggest_float('feature_fraction', 0.5, 1, step=0.1),
        'max_bin'         : trial.suggest_int('max_bin', 50, 250, step=25),
        'reg_alpha'       : trial.suggest_float('reg_alpha', 0, 1, step=0.1),
        'reg_lambda'      : trial.suggest_float('reg_lambda', 0, 1, step=0.1),
        'lags'            : trial.suggest_categorical('lags', lags_grid)
    } 
    return search_space

# Folds
cv_searh = TimeSeriesFold(
                steps              = 36,
                initial_train_size = len(data_train),
                refit              = False,
            )

results_search, frozen_trial = bayesian_search_forecaster(
                                    forecaster    = forecaster,
                                    y             = data.loc[:end_validation, 'users'],
                                    exog          = data.loc[:end_validation, exog_features],
                                    search_space  = search_space,
                                    cv            = cv_searh,
                                    metric        = 'mean_absolute_error',
                                    n_trials      = 20,
                                )
best_params = results_search['params'].iat[0]
best_lags = results_search['lags'].iat[0]

  0%|          | 0/20 [00:00<?, ?it/s]

`Forecaster` refitted using the best-found lags and parameters, and the whole data set: 
  Lags: [  1   2   3  23  24  25 167 168 169] 
  Parameters: {'n_estimators': 900, 'max_depth': 9, 'min_data_in_leaf': 26, 'learning_rate': 0.18505894103437742, 'feature_fraction': 0.9, 'max_bin': 100, 'reg_alpha': 0.0, 'reg_lambda': 0.0}
  Backtesting metric: 69.71461721247047

# Search results
# ==============================================================================
results_search.head(3)

	lags	params	mean_absolute_error	n_estimators	max_depth	min_data_in_leaf	learning_rate	feature_fraction	max_bin	reg_alpha	reg_lambda
0	[1, 2, 3, 23, 24, 25, 167, 168, 169]	{'n_estimators': 900, 'max_depth': 9, 'min_dat...	69.714617	900.0	9.0	26.0	0.185059	0.9	100.0	0.0	0.0
1	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 400, 'max_depth': 9, 'min_dat...	70.098073	400.0	9.0	404.0	0.188182	0.6	150.0	0.2	0.0
2	[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14...	{'n_estimators': 1000, 'max_depth': 7, 'min_da...	71.390424	1000.0	7.0	368.0	0.168250	0.7	100.0	0.3	0.6

Since return_best has been set to True, the forecaster object is automatically updated with the best configuration found and trained on the entire dataset. This final model can then be used for future predictions on new data.

# Best model
# ==============================================================================
forecaster

ForecasterRecursive

General Information

Regressor: LGBMRegressor
Lags: [ 1 2 3 23 24 25 167 168 169]
Window features: ['roll_mean_72']
Window size: 169
Series name: users
Exogenous included: True
Weight function included: False
Differentiation order: None
Creation date: 2025-05-14 22:35:34
Last fit date: 2025-05-14 22:37:22
Skforecast version: 0.16.0
Python version: 3.12.9
Forecaster id: None

Exogenous Variables

month_sin, month_cos, week_of_year_sin, week_of_year_cos, week_day_sin, week_day_cos, hour_day_sin, hour_day_cos, sunrise_hour_sin, sunrise_hour_cos, sunset_hour_sin, sunset_hour_cos, holiday_previous_day, holiday_next_day, temp_roll_mean_1_day, temp_roll_mean_7_day, temp_roll_max_1_day, temp_roll_min_1_day, temp_roll_max_7_day, temp_roll_min_7_day, temp, holiday

Data Transformations

Transformer for y: None
Transformer for exog: None

Training Information

Training range: [Timestamp('2011-01-08 00:00:00'), Timestamp('2012-08-31 23:00:00')]
Training index type: DatetimeIndex
Training index frequency: h

Regressor Parameters

{'boosting_type': 'gbdt', 'class_weight': None, 'colsample_bytree': 1.0, 'importance_type': 'split', 'learning_rate': 0.18505894103437742, 'max_depth': 9, 'min_child_samples': 20, 'min_child_weight': 0.001, 'min_split_gain': 0.0, 'n_estimators': 900, 'n_jobs': None, 'num_leaves': 31, 'objective': None, 'random_state': 15926, 'reg_alpha': 0.0, 'reg_lambda': 0.0, 'subsample': 1.0, 'subsample_for_bin': 200000, 'subsample_freq': 0, 'verbose': -1, 'min_data_in_leaf': 26, 'feature_fraction': 0.9, 'max_bin': 100}

Fit Kwargs

{}

🛈 API Reference 🗎 User Guide

Once the best combination of hyperparameters has been identified using the validation data, the predictive capacity of the model is evaluated when applied to the test set. It is highly recommended to review the documentation for the backtesting_forecaster() function to gain a better understanding of its capabilities. This will help to utilize its full potential to analyze the predictive ability of the model.

# Backtesting model with exogenous variables on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['users'],
                            exog       = data[exog_features],
                            cv         = cv,
                            metric     = 'mean_absolute_error'
                       )
display(metric)
predictions.head()

  0%|          | 0/81 [00:00<?, ?it/s]

	mean_absolute_error
0	57.180266

	pred
2012-09-01 00:00:00	148.990798
2012-09-01 01:00:00	119.511529
2012-09-01 02:00:00	96.210663
2012-09-01 03:00:00	40.839404
2012-09-01 04:00:00	21.862018

# Plot predictions vs real value
# ======================================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['users'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['pred'], name="prediction", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
    title="Real value vs predicted in test data",
    xaxis_title="Date time",
    yaxis_title="Users",
    width=800,
    height=400,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(orientation="h", yanchor="top", y=1.1, xanchor="left", x=0.001)
)
fig.show()

After optimizing lags and hyperparameters, the prediction error of the autoregressive model is reduced.

Feature selection

Feature selection is the process of selecting a subset of relevant features for use in model construction. It is an important step in the machine learning process, as it can help to reduce overfitting, improve model accuracy, and reduce training time. Since the underlying regressors of skforecast follow the scikit-learn API, it is possible to apply the feature selection methods available in scikit-learn with the function select_features(). Two of the most popular methods are Recursive Feature Elimination and Sequential Feature Selection.

💡 Tip

Feature selection is a powerful tool for improving the performance of machine learning models. However, it is computationally expensive and can be time-consuming. Since the goal is to find the best subset of features, not the best model, it is not necessary to use the entire data set or a highly complex model. Instead, it is recommended to use a small subset of the data and a simple model. Once the best subset of features has been identified, the model can then be trained using the entire dataset and a more complex configuration.

# Create forecaster
# ==============================================================================
regressor = LGBMRegressor(
    n_estimators       = 100,
    max_depth          = 5,
    random_state       = 15926,
    verbose            = -1
)
forecaster = ForecasterRecursive(
    regressor       = regressor,
    lags            = [1, 2, 3, 23, 24, 25, 167, 168, 169],
    window_features = window_features
)

# Recursive feature elimination with cross-validation
# ==============================================================================
warnings.filterwarnings("ignore", message="X does not have valid feature names.*")
selector = RFECV(
    estimator = regressor,
    step      = 1,
    cv        = 3,
)
lags_select, window_features_select, exog_select = select_features(
    forecaster      = forecaster,
    selector        = selector,
    y               = data_train['users'],  
    exog            = data_train[exog_features],
    select_only     = None,
    force_inclusion = None,
    subsample       = 0.5,
    random_state    = 123,
    verbose         = True,
)

Recursive feature elimination (RFECV)
-------------------------------------
Total number of records available: 10607
Total number of records used for feature selection: 5303
Number of features available: 32
    Lags            (n=9)
    Window features (n=1)
    Exog            (n=22)
Number of features selected: 25
    Lags            (n=9) : [1, 2, 3, 23, 24, 25, 167, 168, 169]
    Window features (n=1) : ['roll_mean_72']
    Exog            (n=15) : ['month_sin', 'month_cos', 'week_of_year_sin', 'week_of_year_cos', 'week_day_sin', 'week_day_cos', 'hour_day_sin', 'hour_day_cos', 'temp_roll_mean_1_day', 'temp_roll_mean_7_day', 'temp_roll_max_1_day', 'temp_roll_min_1_day', 'temp_roll_max_7_day', 'temp_roll_min_7_day', 'temp']

Scikit-learn's RFECV starts by training a model on the initial set of features, and obtaining the importance of each feature (through attributes such as coef_ or feature_importances_). Then, in each round, the least important features are iteratively removed, followed by cross-validation to calculate the performance of the model with the remaining features. This process continues until further feature removal doesn't improve or starts to degrade the performance of the model (based on a chosen metric), or the min_features_to_select is reached.

The final result is an optimal subset of features that ideally balances model simplicity and predictive power, as determined by the cross-validation process.

The forecaster is trained and re-evaluated using the best subset of features.

# Create a forecaster with the selected features
# ==============================================================================
forecaster = ForecasterRecursive(
    regressor       = LGBMRegressor(**best_params, random_state=15926, verbose=-1),
    lags            = lags_select,
    window_features = window_features
)

# Backtesting model with exogenous variables on test data
# ==============================================================================
metric, predictions = backtesting_forecaster(
    forecaster = forecaster,
    y          = data['users'],
    exog       = data[exog_select],
    cv         = cv,
    metric     = 'mean_absolute_error'
)
metric

  0%|          | 0/81 [00:00<?, ?it/s]

	mean_absolute_error
0	57.660742

The performance of the model remains similar to that of the model trained with all the features. However, the model is now simpler, which will make it faster to train and less prone to overfitting.

# Actualizar las variables exógenas utilizadas
# ==============================================================================
exog_features = exog_select

Model explanaibility

Due to the complex nature of many modern machine learning models, such as ensemble methods, they often function as black boxes, making it difficult to understand why a particular prediction was made. Explanability techniques aim to demystify these models, providing insight into their inner workings and helping to build trust, improve transparency, and meet regulatory requirements in various domains. Enhancing model explainability not only helps to understand model behavior, but also helps to identify biases, improve model performance, and enable stakeholders to make more informed decisions based on machine learning insights.

Skforecast is compatible with some of the most popular model explainability methods: model-specific feature importances, SHAP values, and partial dependence plots.

# Create and train forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
    regressor       = LGBMRegressor(**best_params, random_state=15926, verbose=-1),
    lags            = lags_select,
    window_features = window_features
)
forecaster.fit(
    y    = data.loc[:end_validation, 'users'],
    exog = data.loc[:end_validation, exog_features]
)

Model-specific feature importance

# Extract feature importance
# ==============================================================================
importance = forecaster.get_feature_importances()
importance.head(10)

	feature	importance
0	lag_1	2204
6	lag_167	1910
2	lag_3	1811
7	lag_168	1798
4	lag_24	1740
1	lag_2	1721
8	lag_169	1680
3	lag_23	1667
5	lag_25	1629
9	roll_mean_72	1545

⚠ Warning

The get_feature_importances() method will only return values if the forecaster's regressor has either the coef_ or feature_importances_ attribute, which is the default in scikit-learn.

Shap values

SHAP (SHapley Additive exPlanations) values are a popular method for explaining machine learning models, as they help to understand how variables and values influence predictions visually and quantitatively.

It is possible to generate SHAP-values explanations from skforecast models with just two essential elements:

The internal regressor of the forecaster.
The training matrices created from the time series and used to fit the forecaster.

By leveraging these two components, users can create insightful and interpretable explanations for their skforecast models. These explanations can be used to verify the reliability of the model, identify the most significant factors that contribute to model predictions, and gain a deeper understanding of the underlying relationship between the input variables and the target variable.

# Training matrices used by the forecaster to fit the internal regressor
# ==============================================================================
X_train, y_train = forecaster.create_train_X_y(
                       y    = data.loc[:end_validation, 'users'],
                       exog = data.loc[:end_validation, exog_features]
                   )
display(X_train.head(3))
display(y_train.head(3))

	lag_1	lag_2	lag_3	lag_23	lag_24	lag_25	lag_167	lag_168	lag_169	roll_mean_72	...	week_day_cos	hour_day_sin	hour_day_cos	temp_roll_mean_1_day	temp_roll_mean_7_day	temp_roll_max_1_day	temp_roll_min_1_day	temp_roll_max_7_day	temp_roll_min_7_day	temp
date_time
2011-01-15 01:00:00	28.0	27.0	36.0	1.0	5.0	14.0	16.0	16.0	25.0	55.736111	...	0.62349	0.500000	0.866025	6.594167	6.535595	9.84	4.1	9.84	3.28	6.56
2011-01-15 02:00:00	20.0	28.0	27.0	1.0	1.0	5.0	7.0	16.0	16.0	55.930556	...	0.62349	0.707107	0.707107	6.696667	6.530715	9.84	4.1	9.84	3.28	6.56
2011-01-15 03:00:00	12.0	20.0	28.0	1.0	1.0	1.0	1.0	7.0	16.0	56.083333	...	0.62349	0.866025	0.500000	6.799167	6.525833	9.84	4.1	9.84	3.28	6.56

3 rows × 25 columns

date_time
2011-01-15 01:00:00    20.0
2011-01-15 02:00:00    12.0
2011-01-15 03:00:00     8.0
Freq: h, Name: y, dtype: float64

# Create SHAP explainer (for three base models)
# ==============================================================================
explainer = shap.TreeExplainer(forecaster.regressor)

# Sample 50% of the data to speed up the calculation
rng = np.random.default_rng(seed=785412)
sample = rng.choice(X_train.index, size=int(len(X_train)*0.5), replace=False)
X_train_sample = X_train.loc[sample, :]
shap_values = explainer.shap_values(X_train_sample)

✎ Note

Shap library has several explainers, each designed for a different type of model. The shap.TreeExplainer explainer is used for tree-based models, such as the LGBMRegressor used in this example. For more information, see the SHAP documentation.

# Shap summary plot (top 10)
# ==============================================================================
shap.initjs()
shap.summary_plot(shap_values, X_train_sample, max_display=10, show=False)
fig, ax = plt.gcf(), plt.gca()
ax.set_title("SHAP Summary plot")
ax.tick_params(labelsize=8)
fig.set_size_inches(7, 3)

SHAP values not only allow for interpreting the general behavior of the model but also serve as a powerful tool for analyzing individual predictions. This is especially useful when trying to understand how a specific prediction was made and which variables contributed to it.

To carry out this analysis, it is necessary to access the predictor values — in this case, the lags — at the time of the prediction. This can be achieved by using the create_predict_X method or by enabling the return_predictors=True argument in the backtesting_forecaster function.

Suppose one wants to understand the prediction obtained during the backtesting for the date 2012-10-06 12:00:00.

# Backtesting including the predictors used for each prediction
# ==============================================================================
cv = TimeSeriesFold(steps = 36, initial_train_size = len(data.loc[:end_validation]))
metric, predictions = backtesting_forecaster(
                        forecaster              = forecaster,
                        y                       = data['users'],
                        exog                    = data[exog_features],
                        cv                      = cv,
                        metric                  = 'mean_absolute_error',
                        return_predictors       = True,
                     )

  0%|          | 0/81 [00:00<?, ?it/s]

By setting return_predictors=True, a DataFrame is obtained with the predicted value ('pred'), the partition it belongs to ('fold'), and the value of the lags and exogenous variables used to make each prediction.

predictions.head(3)

	pred	lag_1	lag_2	lag_3	lag_23	lag_24	lag_25	lag_167	lag_168	...	week_day_cos	hour_day_sin	hour_day_cos	temp_roll_mean_1_day	temp_roll_mean_7_day	temp_roll_max_1_day	temp_roll_min_1_day	temp_roll_max_7_day	temp_roll_min_7_day	temp
2012-09-01 00:00:00	165.620993	174.000000	277.000000	303.0	32.0	82.0	152.0	115.0	135.0	...	0.62349	0.258819	0.965926	31.330833	28.714643	36.9	26.24	36.9	24.6	30.34
2012-09-01 01:00:00	142.616863	165.620993	174.000000	277.0	20.0	32.0	82.0	79.0	115.0	...	0.62349	0.500000	0.866025	31.433332	28.724405	36.9	26.24	36.9	24.6	29.52
2012-09-01 02:00:00	130.551772	142.616863	165.620993	174.0	5.0	20.0	32.0	38.0	79.0	...	0.62349	0.707107	0.707107	31.501667	28.734167	36.9	26.24	36.9	24.6	28.70

3 rows × 27 columns

# Waterfall for a specific prediction
# ==============================================================================
predictions = predictions.astype(data[exog_features].dtypes) # Ensure that the types are the same
iloc_predicted_date = predictions.index.get_loc('2012-10-06 12:00:00')
shap_values_single = explainer(predictions.iloc[:, 2:])
shap.plots.waterfall(shap_values_single[iloc_predicted_date], show=False)
fig, ax = plt.gcf(), plt.gca()
fig.set_size_inches(8, 3.5)
ax_list = fig.axes
ax = ax_list[0]
ax.tick_params(labelsize=8)
ax.set
plt.show()

# Forceplot for a specific prediction
# ==============================================================================
shap.force_plot(
    base_value  = shap_values_single.base_values[iloc_predicted_date],
    shap_values = shap_values_single.values[iloc_predicted_date],
    features    = predictions.iloc[iloc_predicted_date, 2:],
)

Session information

import session_info
session_info.show(html=False)

-----
lightgbm            4.6.0
matplotlib          3.10.1
numpy               2.2.5
optuna              3.6.2
pandas              2.2.3
plotly              6.0.1
session_info        v1.0.1
shap                0.47.2
skforecast          0.16.0
sklearn             1.6.1
statsmodels         0.14.4
-----
IPython             9.1.0
jupyter_client      8.6.3
jupyter_core        5.7.2
notebook            6.5.7
-----
Python 3.12.9 | packaged by Anaconda, Inc. | (main, Feb  6 2025, 18:56:27) [GCC 11.2.0]
Linux-6.11.0-25-generic-x86_64-with-glibc2.39
-----
Session information updated at 2025-05-14 22:38

Bibliography

Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia.

Time Series Analysis and Forecasting with ADAM Ivan Svetunkov.

Joseph, M. (2022). Modern time series forecasting with Python: Explore industry-ready time series forecasting using modern machine learning and Deep Learning. Packt Publishing.

Citation

How to cite this document

If you use this document or any part of it, please acknowledge the source, thank you!

Forecasting time series with gradient LightGBM by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a Attribution-NonCommercial-ShareAlike 4.0 International at https://www.cienciadedatos.net/documentos/py58-forecasting-time-series-with-lightgbm.html

How to cite skforecast

If you use skforecast for a publication, we would appreciate it if you cite the published software.

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2025). skforecast (v0.16.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2025). skforecast (Version 0.16.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.16.0}, month = {05}, year = {2025}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }

Did you like the article? Your support is important

Your contribution will help me to continue generating free educational content. Many thanks! 😊

This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.

Allowed:

Share: copy and redistribute the material in any medium or format.
Adapt: remix, transform, and build upon the material.

Under the following terms:

Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.

Forecasting time series with LightGBM

Joaquín Amat Rodrigo, Javier Escobar Ortiz

November 2024 (last update May 2025)

Introduction

Libraries

Data

Data exploration

Plot time series

Seasonality plots

Autocorrelation plots

Recursive multi-step forecasting with LightGBM

Forecaster

ForecasterRecursive

Backtesting

Exogenous Variables

Hyperparameter tuning

ForecasterRecursive

Feature selection

Model explanaibility

Model-specific feature importance

Shap values

Session information

Bibliography

Citation