More about forecasting in cienciadedatos.net


Introduction

scikit-learn-intelex is a library developed by Intel that optimizes the performance of scikit-learn models. The software acceleration is achieved with vector instructions, AI hardware-specific memory optimizations, threading, and optimizations. The list of scikit-learn algorithms currently supported by scikit-learn-intelex can be found here.

One of the most notorious improvements is observed when using random forest models, without the need of GPU acceleration.

Libraries and data

# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset

# Plots
# ==============================================================================
from skforecast.plot import set_dark_theme
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)
plt.style.use('seaborn-v0_8-darkgrid')

# Modelling and Forecasting
# ==============================================================================
import skforecast
import sklearn
import sklearnex
import time
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearnex.ensemble import RandomForestRegressor as RandomForestRegressorIntel
from sklearnex.linear_model import Ridge as RidgeIntel
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster

# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast  : {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version sklearnex   : {sklearnex.__version__}")
Version skforecast  : 0.19.0
Version scikit-learn: 1.7.2
Version sklearnex   : 2199.9.9

✏️ Note

The primary goal of this notebook is to demonstrate how to accelerate time series forecasting using scikit-learn-intelex. The focus is on comparing execution times between standard scikit-learn and its Intel-optimized counterpart, rather than achieving the best possible predictive accuracy. Users should perform their own hyperparameter tuning to obtain optimal results for their specific use cases.

# Downloading data
# ==============================================================================
data = fetch_dataset('bike_sharing_extended_features')
data.head(4)
╭──────────────────────── bike_sharing_extended_features ─────────────────────────╮
│ Description:                                                                    │
│ Hourly usage of the bike share system in the city of Washington D.C. during the │
│ years 2011 and 2012. In addition to the number of users per hour, the dataset   │
│ was enriched by introducing supplementary features. Addition includes calendar- │
│ based variables (day of the week, hour of the day, month, etc.), indicators for │
│ sunlight, incorporation of rolling temperature averages, and the creation of    │
│ polynomial features generated from variable pairs. All cyclic variables are     │
│ encoded using sine and cosine functions to ensure accurate representation.      │
│                                                                                 │
│ Source:                                                                         │
│ Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository.   │
│ https://doi.org/10.24432/C5W894.                                                │
│                                                                                 │
│ URL:                                                                            │
│ https://raw.githubusercontent.com/skforecast/skforecast-                        │
│ datasets/main/data/bike_sharing_extended_features.csv                           │
│                                                                                 │
│ Shape: 17352 rows x 90 columns                                                  │
╰─────────────────────────────────────────────────────────────────────────────────╯
users weather month_sin month_cos week_of_year_sin week_of_year_cos week_day_sin week_day_cos hour_day_sin hour_day_cos ... temp_roll_mean_1_day temp_roll_mean_7_day temp_roll_max_1_day temp_roll_min_1_day temp_roll_max_7_day temp_roll_min_7_day holiday_previous_day holiday_next_day temp holiday
date_time
2011-01-08 00:00:00 25.0 mist 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.258819 0.965926 ... 8.063334 10.127976 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0
2011-01-08 01:00:00 16.0 mist 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.500000 0.866025 ... 8.029166 10.113334 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0
2011-01-08 02:00:00 16.0 mist 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.707107 0.707107 ... 7.995000 10.103572 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0
2011-01-08 03:00:00 7.0 rain 0.5 0.866025 0.120537 0.992709 -0.781832 0.62349 0.866025 0.500000 ... 7.960833 10.093809 9.02 6.56 18.86 4.92 0.0 0.0 7.38 0.0

4 rows × 90 columns

# Split train-validation-test
# ==============================================================================
end_train = '2012-03-31 23:59:00'
end_validation = '2012-08-31 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")


exog_features = [
    'month_sin', 
    'month_cos',
    'week_of_year_sin',
    'week_of_year_cos',
    'week_day_sin',
    'week_day_cos',
    'hour_day_sin',
    'hour_day_cos',
    'sunrise_hour_sin',
    'sunrise_hour_cos',
    'sunset_hour_sin',
    'sunset_hour_cos',
    'holiday_previous_day',
    'holiday_next_day',
    'temp_roll_mean_1_day',
    'temp_roll_mean_7_day',
    'temp_roll_max_1_day',
    'temp_roll_min_1_day',
    'temp_roll_max_7_day',
    'temp_roll_min_7_day',
    'temp',
    'holiday'
]
Dates train      : 2011-01-08 00:00:00 --- 2012-03-31 23:00:00  (n=10776)
Dates validacion : 2012-04-01 00:00:00 --- 2012-08-31 23:00:00  (n=3672)
Dates test       : 2012-09-01 00:00:00 --- 2012-12-30 23:00:00  (n=2904)

RandomForest

scikit-learn

# Create forecaster
# ==============================================================================
params = {
    'n_estimators': 100,
    'max_depth': 10,
    'min_samples_split': 10,
    'min_samples_leaf': 1,
    'random_state': 15926
}
forecaster = ForecasterRecursive(
                estimator = RandomForestRegressor(**params),
                lags      = 24,
             )

# Train forecaster
# ==============================================================================
start = time.perf_counter()
forecaster.fit(y=data.loc[:end_validation, 'users'], exog=data.loc[:end_validation, exog_features])
end = time.perf_counter()
print(f"Elapsed time fit: {end - start:.4f} seconds")


# Prediction
# ==============================================================================
start = time.perf_counter()
predictions = forecaster.predict(steps=100, exog=data.loc[end_validation:, exog_features])
end = time.perf_counter()
print(f"Elapsed time predict: {end - start:.4f} seconds")

# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
        steps              = 36,
        initial_train_size = len(data[:end_validation]),
        refit              = False,
)
start = time.perf_counter()
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['users'],
                            exog       = data[exog_features],
                            cv         = cv,
                            metric     = ['mean_absolute_error', 'mean_squared_error']
                       )
end = time.perf_counter()
print(f"Elapsed time backtesting: {end - start:.4f} seconds")
display(metric)
Elapsed time fit: 14.7957 seconds
Elapsed time predict: 0.2807 seconds
Elapsed time backtesting: 21.3066 seconds
mean_absolute_error mean_squared_error
0 63.933384 10527.883828

scikit-learn-intelex

# Create forecaster
# ==============================================================================
params = {
    'n_estimators': 100,
    'max_depth': 10,
    'min_samples_split': 10,
    'min_samples_leaf': 1,
    'random_state': 15926
}
forecaster = ForecasterRecursive(
                estimator = RandomForestRegressorIntel(**params),
                lags      = 24,
             )

# Train forecaster
# ==============================================================================
start = time.perf_counter()
forecaster.fit(y=data.loc[:end_validation, 'users'], exog=data.loc[:end_validation, exog_features])
end = time.perf_counter()
print(f"Elapsed time fit: {end - start:.4f} seconds")


# Prediction
# ==============================================================================
start = time.perf_counter()
predictions = forecaster.predict(steps=100, exog=data.loc[end_validation:, exog_features])
end = time.perf_counter()
print(f"Elapsed time predict: {end - start:.4f} seconds")


# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
        steps              = 36,
        initial_train_size = len(data[:end_validation]),
        refit              = False,
)
start = time.perf_counter()
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['users'],
                            exog       = data[exog_features],
                            cv         = cv,
                            metric     = ['mean_absolute_error', 'mean_squared_error']
                       )
end = time.perf_counter()
print(f"Elapsed time backtesting: {end - start:.4f} seconds")
display(metric)
Elapsed time fit: 0.2599 seconds
Elapsed time predict: 0.0476 seconds
Elapsed time backtesting: 1.8438 seconds
mean_absolute_error mean_squared_error
0 64.353349 10713.786049

Ridge Regression

Scikit-learn

# Create forecaster
# ==============================================================================
params = {
    'alpha': 1.0,
    'solver': 'auto',
    'random_state': 15926
}
forecaster = ForecasterRecursive(
                estimator = Ridge(**params),
                lags      = 24,
             )

# Train forecaster
# ==============================================================================
start = time.perf_counter()
forecaster.fit(y=data.loc[:end_validation, 'users'], exog=data.loc[:end_validation, exog_features])
end = time.perf_counter()
print(f"Elapsed time fit: {end - start:.4f} seconds")


# Prediction
# ==============================================================================
start = time.perf_counter()
predictions = forecaster.predict(steps=100, exog=data.loc[end_validation:, exog_features])
end = time.perf_counter()
print(f"Elapsed time predict: {end - start:.4f} seconds")


# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
        steps              = 36,
        initial_train_size = len(data[:end_validation]),
        refit              = False,
)
start = time.perf_counter()
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['users'],
                            exog       = data[exog_features],
                            cv         = cv,
                            metric     = ['mean_absolute_error', 'mean_squared_error']
                       )
end = time.perf_counter()
print(f"Elapsed time backtesting: {end - start:.4f} seconds")
display(metric)
Elapsed time fit: 0.0472 seconds
Elapsed time predict: 0.0068 seconds
Elapsed time backtesting: 0.2048 seconds
mean_absolute_error mean_squared_error
0 91.835777 18555.714571

Scikit-learn-intelex

# Create forecaster
# ==============================================================================
params = {
    'alpha': 1.0,
    'solver': 'auto',
    'random_state': 15926
}
forecaster = ForecasterRecursive(
                estimator = RidgeIntel(**params),
                lags      = 24,
             )

# Train forecaster
# ==============================================================================
start = time.perf_counter()
forecaster.fit(y=data.loc[:end_validation, 'users'], exog=data.loc[:end_validation, exog_features])
end = time.perf_counter()
print(f"Elapsed time fit: {end - start:.4f} seconds")


# Prediction
# ==============================================================================
start = time.perf_counter()
predictions = forecaster.predict(steps=100, exog=data.loc[end_validation:, exog_features])
end = time.perf_counter()
print(f"Elapsed time predict: {end - start:.4f} seconds")


# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
        steps              = 36,
        initial_train_size = len(data[:end_validation]),
        refit              = False,
)
start = time.perf_counter()
metric, predictions = backtesting_forecaster(
                            forecaster = forecaster,
                            y          = data['users'],
                            exog       = data[exog_features],
                            cv         = cv,
                            metric     = ['mean_absolute_error', 'mean_squared_error']
                       )
end = time.perf_counter()
print(f"Elapsed time backtesting: {end - start:.4f} seconds")
display(metric)
Elapsed time fit: 0.0211 seconds
Elapsed time predict: 0.0233 seconds
Elapsed time backtesting: 0.6921 seconds
mean_absolute_error mean_squared_error
0 91.835777 18555.714571

Session information

import session_info
session_info.show(html=False)
-----
matplotlib          3.10.8
numpy               2.3.4
pandas              2.3.3
plotly              6.4.0
session_info        v1.0.1
skforecast          0.19.0
sklearn             1.7.2
sklearnex           2199.9.9
-----
IPython             9.7.0
jupyter_client      8.6.3
jupyter_core        5.9.1
-----
Python 3.13.9 | packaged by conda-forge | (main, Oct 22 2025, 23:12:41) [MSC v.1944 64 bit (AMD64)]
Windows-11-10.0.26100-SP0
-----
Session information updated at 2025-11-28 23:24

Citation

How to cite this document

If you use this document or any part of it, please acknowledge the source, thank you!

Accelerate forecasting with scikit-learn-intelex by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py75-accelerate-forecasting-with-scikit-learn-intelex.html

How to cite skforecast

If you use skforecast for a publication, we would appreciate it if you cite the published software.

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2025). skforecast (v0.19.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2025). skforecast (Version 0.19.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.19.0}, month = {11}, year = {2025}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }


Did you like the article? Your support is important

Your contribution will help me to continue generating free educational content. Many thanks! 😊

Become a GitHub Sponsor Become a GitHub Sponsor

Creative Commons Licence

This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.

Allowed:

  • Share: copy and redistribute the material in any medium or format.

  • Adapt: remix, transform, and build upon the material.

Under the following terms:

  • Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

  • NonCommercial: You may not use the material for commercial purposes.

  • ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.