More about forecasting in cienciadedatos.net
- ARIMA and SARIMAX models with python
- Time series forecasting with machine learning
- Forecasting time series with gradient boosting: XGBoost, LightGBM and CatBoost
- Forecasting time series with XGBoost
- Probabilistic forecasting
- Forecasting with deep learning
- Forecasting energy demand with machine learning
- Forecasting web traffic with machine learning
- Intermittent demand forecasting
- Modelling time series trend with tree-based models
- Bitcoin price prediction with Python
- Stacking ensemble of machine learning models to improve forecasting
- Interpretable forecasting models
- Mitigating the Impact of Covid on forecasting Models
- Forecasting time series with missing values
Introduction¶
scikit-learn-intelex is a library developed by Intel that optimizes the performance of scikit-learn models. The software acceleration is achieved with vector instructions, AI hardware-specific memory optimizations, threading, and optimizations. The list of scikit-learn algorithms currently supported by scikit-learn-intelex can be found here.
One of the most notorious improvements is observed when using random forest models, without the need of GPU acceleration.
Libraries and data¶
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
from skforecast.datasets import fetch_dataset
# Plots
# ==============================================================================
from skforecast.plot import set_dark_theme
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)
plt.style.use('seaborn-v0_8-darkgrid')
# Modelling and Forecasting
# ==============================================================================
import skforecast
import sklearn
import sklearnex
import time
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearnex.ensemble import RandomForestRegressor as RandomForestRegressorIntel
from sklearnex.linear_model import Ridge as RidgeIntel
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster
# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast : {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version sklearnex : {sklearnex.__version__}")
Version skforecast : 0.19.0 Version scikit-learn: 1.7.2 Version sklearnex : 2199.9.9
✏️ Note
The primary goal of this notebook is to demonstrate how to accelerate time series forecasting using scikit-learn-intelex. The focus is on comparing execution times between standard scikit-learn and its Intel-optimized counterpart, rather than achieving the best possible predictive accuracy. Users should perform their own hyperparameter tuning to obtain optimal results for their specific use cases.
# Downloading data
# ==============================================================================
data = fetch_dataset('bike_sharing_extended_features')
data.head(4)
╭──────────────────────── bike_sharing_extended_features ─────────────────────────╮ │ Description: │ │ Hourly usage of the bike share system in the city of Washington D.C. during the │ │ years 2011 and 2012. In addition to the number of users per hour, the dataset │ │ was enriched by introducing supplementary features. Addition includes calendar- │ │ based variables (day of the week, hour of the day, month, etc.), indicators for │ │ sunlight, incorporation of rolling temperature averages, and the creation of │ │ polynomial features generated from variable pairs. All cyclic variables are │ │ encoded using sine and cosine functions to ensure accurate representation. │ │ │ │ Source: │ │ Fanaee-T,Hadi. (2013). Bike Sharing Dataset. UCI Machine Learning Repository. │ │ https://doi.org/10.24432/C5W894. │ │ │ │ URL: │ │ https://raw.githubusercontent.com/skforecast/skforecast- │ │ datasets/main/data/bike_sharing_extended_features.csv │ │ │ │ Shape: 17352 rows x 90 columns │ ╰─────────────────────────────────────────────────────────────────────────────────╯
| users | weather | month_sin | month_cos | week_of_year_sin | week_of_year_cos | week_day_sin | week_day_cos | hour_day_sin | hour_day_cos | ... | temp_roll_mean_1_day | temp_roll_mean_7_day | temp_roll_max_1_day | temp_roll_min_1_day | temp_roll_max_7_day | temp_roll_min_7_day | holiday_previous_day | holiday_next_day | temp | holiday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| date_time | |||||||||||||||||||||
| 2011-01-08 00:00:00 | 25.0 | mist | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.258819 | 0.965926 | ... | 8.063334 | 10.127976 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
| 2011-01-08 01:00:00 | 16.0 | mist | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.500000 | 0.866025 | ... | 8.029166 | 10.113334 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
| 2011-01-08 02:00:00 | 16.0 | mist | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.707107 | 0.707107 | ... | 7.995000 | 10.103572 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
| 2011-01-08 03:00:00 | 7.0 | rain | 0.5 | 0.866025 | 0.120537 | 0.992709 | -0.781832 | 0.62349 | 0.866025 | 0.500000 | ... | 7.960833 | 10.093809 | 9.02 | 6.56 | 18.86 | 4.92 | 0.0 | 0.0 | 7.38 | 0.0 |
4 rows × 90 columns
# Split train-validation-test
# ==============================================================================
end_train = '2012-03-31 23:59:00'
end_validation = '2012-08-31 23:59:00'
data_train = data.loc[: end_train, :]
data_val = data.loc[end_train:end_validation, :]
data_test = data.loc[end_validation:, :]
print(f"Dates train : {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})")
print(f"Dates validacion : {data_val.index.min()} --- {data_val.index.max()} (n={len(data_val)})")
print(f"Dates test : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
exog_features = [
'month_sin',
'month_cos',
'week_of_year_sin',
'week_of_year_cos',
'week_day_sin',
'week_day_cos',
'hour_day_sin',
'hour_day_cos',
'sunrise_hour_sin',
'sunrise_hour_cos',
'sunset_hour_sin',
'sunset_hour_cos',
'holiday_previous_day',
'holiday_next_day',
'temp_roll_mean_1_day',
'temp_roll_mean_7_day',
'temp_roll_max_1_day',
'temp_roll_min_1_day',
'temp_roll_max_7_day',
'temp_roll_min_7_day',
'temp',
'holiday'
]
Dates train : 2011-01-08 00:00:00 --- 2012-03-31 23:00:00 (n=10776) Dates validacion : 2012-04-01 00:00:00 --- 2012-08-31 23:00:00 (n=3672) Dates test : 2012-09-01 00:00:00 --- 2012-12-30 23:00:00 (n=2904)
RandomForest¶
scikit-learn¶
# Create forecaster
# ==============================================================================
params = {
'n_estimators': 100,
'max_depth': 10,
'min_samples_split': 10,
'min_samples_leaf': 1,
'random_state': 15926
}
forecaster = ForecasterRecursive(
estimator = RandomForestRegressor(**params),
lags = 24,
)
# Train forecaster
# ==============================================================================
start = time.perf_counter()
forecaster.fit(y=data.loc[:end_validation, 'users'], exog=data.loc[:end_validation, exog_features])
end = time.perf_counter()
print(f"Elapsed time fit: {end - start:.4f} seconds")
# Prediction
# ==============================================================================
start = time.perf_counter()
predictions = forecaster.predict(steps=100, exog=data.loc[end_validation:, exog_features])
end = time.perf_counter()
print(f"Elapsed time predict: {end - start:.4f} seconds")
# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
steps = 36,
initial_train_size = len(data[:end_validation]),
refit = False,
)
start = time.perf_counter()
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data[exog_features],
cv = cv,
metric = ['mean_absolute_error', 'mean_squared_error']
)
end = time.perf_counter()
print(f"Elapsed time backtesting: {end - start:.4f} seconds")
display(metric)
Elapsed time fit: 14.7957 seconds Elapsed time predict: 0.2807 seconds
Elapsed time backtesting: 21.3066 seconds
| mean_absolute_error | mean_squared_error | |
|---|---|---|
| 0 | 63.933384 | 10527.883828 |
scikit-learn-intelex¶
# Create forecaster
# ==============================================================================
params = {
'n_estimators': 100,
'max_depth': 10,
'min_samples_split': 10,
'min_samples_leaf': 1,
'random_state': 15926
}
forecaster = ForecasterRecursive(
estimator = RandomForestRegressorIntel(**params),
lags = 24,
)
# Train forecaster
# ==============================================================================
start = time.perf_counter()
forecaster.fit(y=data.loc[:end_validation, 'users'], exog=data.loc[:end_validation, exog_features])
end = time.perf_counter()
print(f"Elapsed time fit: {end - start:.4f} seconds")
# Prediction
# ==============================================================================
start = time.perf_counter()
predictions = forecaster.predict(steps=100, exog=data.loc[end_validation:, exog_features])
end = time.perf_counter()
print(f"Elapsed time predict: {end - start:.4f} seconds")
# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
steps = 36,
initial_train_size = len(data[:end_validation]),
refit = False,
)
start = time.perf_counter()
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data[exog_features],
cv = cv,
metric = ['mean_absolute_error', 'mean_squared_error']
)
end = time.perf_counter()
print(f"Elapsed time backtesting: {end - start:.4f} seconds")
display(metric)
Elapsed time fit: 0.2599 seconds Elapsed time predict: 0.0476 seconds
Elapsed time backtesting: 1.8438 seconds
| mean_absolute_error | mean_squared_error | |
|---|---|---|
| 0 | 64.353349 | 10713.786049 |
Ridge Regression¶
Scikit-learn¶
# Create forecaster
# ==============================================================================
params = {
'alpha': 1.0,
'solver': 'auto',
'random_state': 15926
}
forecaster = ForecasterRecursive(
estimator = Ridge(**params),
lags = 24,
)
# Train forecaster
# ==============================================================================
start = time.perf_counter()
forecaster.fit(y=data.loc[:end_validation, 'users'], exog=data.loc[:end_validation, exog_features])
end = time.perf_counter()
print(f"Elapsed time fit: {end - start:.4f} seconds")
# Prediction
# ==============================================================================
start = time.perf_counter()
predictions = forecaster.predict(steps=100, exog=data.loc[end_validation:, exog_features])
end = time.perf_counter()
print(f"Elapsed time predict: {end - start:.4f} seconds")
# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
steps = 36,
initial_train_size = len(data[:end_validation]),
refit = False,
)
start = time.perf_counter()
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data[exog_features],
cv = cv,
metric = ['mean_absolute_error', 'mean_squared_error']
)
end = time.perf_counter()
print(f"Elapsed time backtesting: {end - start:.4f} seconds")
display(metric)
Elapsed time fit: 0.0472 seconds Elapsed time predict: 0.0068 seconds
Elapsed time backtesting: 0.2048 seconds
| mean_absolute_error | mean_squared_error | |
|---|---|---|
| 0 | 91.835777 | 18555.714571 |
Scikit-learn-intelex¶
# Create forecaster
# ==============================================================================
params = {
'alpha': 1.0,
'solver': 'auto',
'random_state': 15926
}
forecaster = ForecasterRecursive(
estimator = RidgeIntel(**params),
lags = 24,
)
# Train forecaster
# ==============================================================================
start = time.perf_counter()
forecaster.fit(y=data.loc[:end_validation, 'users'], exog=data.loc[:end_validation, exog_features])
end = time.perf_counter()
print(f"Elapsed time fit: {end - start:.4f} seconds")
# Prediction
# ==============================================================================
start = time.perf_counter()
predictions = forecaster.predict(steps=100, exog=data.loc[end_validation:, exog_features])
end = time.perf_counter()
print(f"Elapsed time predict: {end - start:.4f} seconds")
# Backtest model on test data
# ==============================================================================
cv = TimeSeriesFold(
steps = 36,
initial_train_size = len(data[:end_validation]),
refit = False,
)
start = time.perf_counter()
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data[exog_features],
cv = cv,
metric = ['mean_absolute_error', 'mean_squared_error']
)
end = time.perf_counter()
print(f"Elapsed time backtesting: {end - start:.4f} seconds")
display(metric)
Elapsed time fit: 0.0211 seconds Elapsed time predict: 0.0233 seconds
Elapsed time backtesting: 0.6921 seconds
| mean_absolute_error | mean_squared_error | |
|---|---|---|
| 0 | 91.835777 | 18555.714571 |
Session information¶
import session_info
session_info.show(html=False)
----- matplotlib 3.10.8 numpy 2.3.4 pandas 2.3.3 plotly 6.4.0 session_info v1.0.1 skforecast 0.19.0 sklearn 1.7.2 sklearnex 2199.9.9 ----- IPython 9.7.0 jupyter_client 8.6.3 jupyter_core 5.9.1 ----- Python 3.13.9 | packaged by conda-forge | (main, Oct 22 2025, 23:12:41) [MSC v.1944 64 bit (AMD64)] Windows-11-10.0.26100-SP0 ----- Session information updated at 2025-11-28 23:24
Citation¶
How to cite this document
If you use this document or any part of it, please acknowledge the source, thank you!
Accelerate forecasting with scikit-learn-intelex by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py75-accelerate-forecasting-with-scikit-learn-intelex.html
How to cite skforecast
If you use skforecast for a publication, we would appreciate it if you cite the published software.
Zenodo:
Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2025). skforecast (v0.19.0). Zenodo. https://doi.org/10.5281/zenodo.8382788
APA:
Amat Rodrigo, J., & Escobar Ortiz, J. (2025). skforecast (Version 0.19.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788
BibTeX:
@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.19.0}, month = {11}, year = {2025}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }
Did you like the article? Your support is important
Your contribution will help me to continue generating free educational content. Many thanks! 😊
This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.
Allowed:
-
Share: copy and redistribute the material in any medium or format.
-
Adapt: remix, transform, and build upon the material.
Under the following terms:
-
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial: You may not use the material for commercial purposes.
-
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
