If you like Skforecast , help us giving a star on GitHub! ⭐️
More about forecasting
Intermittent demand forecasting is a statistical method used to predict the demand for products that have sporadic or irregular sales patterns. These types of products are characterized by periods of high demand followed by periods of little or no demand. Intermittent demand forecasting is used in the manufacturing, retail, and healthcare industries to manage inventory levels, optimize production schedules, and reduce out-of-stocks and excess inventory costs.
Regular intermittent demand refers to predictable demand patterns with known intervals between demand periods. This type of intermittent demand occurs with a degree of regularity, such as a product that has seasonal demand or a product that is ordered every month on a certain date. Forecasting of regular intermittent demand can be done using statistical methods, such as Croston's method, which is a common technique used to forecast demand in these situations. However, these methods often do not take into account important exogenous variables that can provide valuable information about demand intervals. As a result, machine learning-based forecasting can be an appealing alternative as it can incorporate these variables and improve accuracy.
On the other hand, irregular intermittent demand refers to unpredictable demand patterns, with no known intervals between demand periods. This type of intermittent demand is random and unpredictable, such as a product that is ordered only a few times a year and in varying quantities. Forecasting irregular intermittent demand is challenging because there is no predictable pattern to the demand. Traditional forecasting methods may not work effectively in these situations and alternative forecasting methods, such as bootstrapping or simulation, may be required.
In summary, regular intermittent demand has a predictable pattern whereas irregular intermittent demand does not. Forecasting regular intermittent demand is easier than forecasting irregular intermittent demand due to the predictability of the demand pattern.
This document demonstrates how the Python library skforecast can be used to forecast regular intermittent demand scenarios. By using this library, the machine learning model can focus on learning to predict demand during periods of interest, while avoiding the influence of periods when there is no demand.
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd
# Plots
# ==============================================================================
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)
# Modeling
# ==============================================================================
import skforecast
import sklearn
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_absolute_error
from skforecast.recursive import ForecasterRecursive
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster
# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")
The data used in this example represents the number of users who visited a store during its opening hours from Monday to Friday, between 7:00 and 20:00. Therefore, any predictions outside this period are not useful and can either be ignored or set to 0.
# Downloading data
# ======================================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine'
'-learning-python/master/data/intermittent_demand.csv')
data = pd.read_csv(url, sep=',')
data['date_time'] = pd.to_datetime(data['date_time'], format='%Y-%m-%d %H:%M:%S')
data = data.set_index('date_time')
data = data.asfreq('h')
data = data.sort_index()
data.head(3)
# Split train-val-test
# ======================================================================================
end_train = '2012-03-31 23:59:00'
end_validation = '2012-08-31 23:59:00'
data_train = data.loc[: end_train, :]
data_val = data.loc[end_train:end_validation, :]
data_test = data.loc[end_validation:, :]
print(f"Dates train : {data_train.index.min()} --- {data_train.index.max()} (n={len(data_train)})")
print(f"Dates validation : {data_val.index.min()} --- {data_val.index.max()} (n={len(data_val)})")
print(f"Dates test : {data_test.index.min()} --- {data_test.index.max()} (n={len(data_test)})")
# Plot time series
# ======================================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_train.index, y=data_train['users'], name="train", mode="lines")
trace2 = go.Scatter(x=data_val.index, y=data_val['users'], name="validation", mode="lines")
trace3 = go.Scatter(x=data_test.index, y=data_test['users'], name="test", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.add_trace(trace3)
fig.update_layout(
title="Time series of users",
xaxis_title="Date time",
yaxis_title="Users",
width = 750,
height = 350,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1,
xanchor="left",
x=0.001
)
)
fig.show()
# Boxplot for weekly seasonality
# ==============================================================================
data['week_day'] = data.index.day_of_week + 1
fig = px.box(
data,
x="week_day",
y="users",
title = 'Distribution of users per day of week',
width=600,
height=300
)
median_values = data.groupby('week_day')['users'].median()
fig.add_trace(
go.Scatter(
x=median_values.index,
y=median_values.values,
mode='lines+markers',
line=dict(color='blue', dash='dash'),
showlegend=False
)
)
fig.update_layout(margin=dict(l=20, r=20, t=35, b=20))
fig.show()
# Boxplot for daily seasonality
# ==============================================================================
data['hour_day'] = data.index.hour + 1
fig = px.box(
data,
x="hour_day",
y="users",
title = 'Distribution of users per hour of day',
width=600,
height=300
)
median_values = data.groupby('hour_day')['users'].median()
fig.add_trace(
go.Scatter(
x=median_values.index,
y=median_values.values,
mode='lines+markers',
line=dict(color='blue', dash='dash'),
showlegend=False
)
)
fig.update_layout(margin=dict(l=20, r=20, t=35, b=20))
fig.show()
To accurately evaluate the performance of the model, it is crucial to define a metric that closely reflects the business scenario in which the model will be used. Specifically, in this case, the model's performance should be optimized during weekdays from 9:00 to 20:00.
Fortunately, skforecast provides the flexibility to use custom functions as metrics for backtesting and model tuning.
def custom_metric(y_true, y_pred):
"""
Calculate the mean absolute error using only the predicted values for weekdays
from 9:00 AM to 8:00 PM
"""
day_of_week = y_true.index.day_of_week
hour_of_day = y_true.index.hour
mask = day_of_week.isin([0, 1, 2, 3, 4]) | ((hour_of_day > 7) | (hour_of_day < 20))
metric = mean_absolute_error(y_true[mask], y_pred[mask])
return metric
# Create forecaster
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(
learning_rate = 0.1,
max_depth = 5,
n_estimators = 500,
random_state = 123,
verbose = -1
),
lags = 24
)
forecaster
# Backtesting test period
# ==============================================================================
cv = TimeSeriesFold(
steps = 36,
initial_train_size = len(data.loc[:end_validation]),
refit = False
)
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
cv = cv,
metric = custom_metric,
verbose = False,
show_progress = True
)
metric
# Set to zero predictions for closed hours
# ==============================================================================
hour = data_test.index.hour
day_of_week = data_test.index.day_of_week
closed_hours = (hour < 7) | (hour > 20)
closed_days = day_of_week.isin([5, 6])
is_closed = (closed_hours) | (closed_days)
predictions[is_closed] = 0
# Plot predictions vs real value
# ======================================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['users'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['pred'], name="prediction", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
title="Real value vs predicted in test data",
xaxis_title="Date time",
yaxis_title="Users",
width = 750,
height = 350,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1.1,
xanchor="left",
x=0.001
)
)
fig.show()
When analysing the estimated predictions, it is clear that the model struggles to accurately capture the patterns of store opening and closing times. In addition, the influence of lags leads to an underestimation of the first few hours of the day and the days following the closing days.
# Distribution of residuals by day of week and hour of day
# ==============================================================================
residuals = (predictions['pred'] - data_test['users']).to_frame('residuals')
residuals['week_day'] = residuals.index.day_of_week + 1
residuals['hour_day'] = residuals.index.hour + 1
fig = px.box(
residuals,
x="week_day",
y="residuals",
title = 'Distribution of residuals per hour of day',
width=600,
height=300
)
fig.update_layout(margin=dict(l=20, r=20, t=35, b=20))
fig.show()
fig = px.box(
residuals,
x="hour_day",
y="residuals",
title = 'Distribution of residuals per hour of day',
width=600,
height=300
)
fig.update_layout(margin=dict(l=20, r=20, t=35, b=20))
fig.show()
Exogenous variables can be used in a forecasting model to provide additional information and improve the model's ability to detect patterns. This approach offers the advantage of incorporating external factors that could influence the accuracy of the forecast, leading to a more reliable and accurate forecasting model. The exog
argument makes it incredibly easy to integrate exogenous variables into the forecaster.
# Create exogenous variable
# ==============================================================================
hour = data.index.hour
day_of_week = data.index.day_of_week
closed_hours = (hour < 7) | (hour > 20)
closed_days = day_of_week.isin([5, 6])
is_closed = (closed_hours) | (closed_days)
data['is_closed'] = is_closed.astype(int)
end_train = '2012-03-31 23:59:00'
end_validation = '2012-08-31 23:59:00'
data_train = data.loc[: end_train, :]
data_val = data.loc[end_train:end_validation, :]
data_test = data.loc[end_validation:, :]
# Backtesting test period
# ==============================================================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data['is_closed'],
cv = cv,
metric = custom_metric,
verbose = False,
show_progress = True
)
metric
# Set to zero predictions for closed hours
# ==============================================================================
hour = data_test.index.hour
day_of_week = data_test.index.day_of_week
closed_hours = (hour < 7) | (hour > 20)
closed_days = day_of_week.isin([5, 6])
is_closed = (closed_hours) | (closed_days)
predictions[is_closed] = 0
# Plot predictions vs real value
# ======================================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['users'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['pred'], name="prediction", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
title="Real value vs predicted in test data",
xaxis_title="Date time",
yaxis_title="Users",
width = 750,
height = 350,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1.1,
xanchor="left",
x=0.001
)
)
fig.show()
Incorporating an exogenous variable into the model led to a significant improvement in forecasting accuracy, with the error being halved. This highlights the importance of using external factors to enhance the performance of the forecasting model.
Hyperparameter tuning is a critical aspect of developing accurate and effective machine learning models. Hyperparameters are values that cannot be learned from data and must be set by the user before the model is trained. These hyperparameters can have a significant impact on the performance of the model, and careful tuning can improve its accuracy and generalisation to new data. In the case of forecasting models, the lags included in the model can be considered as an additional hyperparameter.
Hyperparameter tuning involves systematically testing different values or combinations of hyperparameters (including lags) to find the optimal configuration that gives the best results. The skforecast library offers several hyperparameter tuning strategies, including grid search, random search, and Bayesian search, to identify the optimal combination of lags and hyperparameters that achieve the best prediction performance.
# Grid search hyperparameters and lags
# ==============================================================================
forecaster = ForecasterRecursive(
regressor = LGBMRegressor(
learning_rate = 0.1,
max_depth = 5,
n_estimators = 500,
random_state = 123,
verbose = -1
),
lags = 24
)
# Lags
lags_grid = [24, 48, 72]
# Huperparameters
param_grid = {
"n_estimators": [50, 100, 500],
"max_depth": [5, 10, 15],
"learning_rate": [0.01, 0.1],
}
# Folds
cv_search = TimeSeriesFold(
steps = 36,
initial_train_size = len(data.loc[:end_train]),
refit = False
)
results_grid = grid_search_forecaster(
forecaster = forecaster,
y = data.loc[:end_validation, 'users'],
exog = data.loc[:end_validation, 'is_closed'],
cv = cv_search,
param_grid = param_grid,
lags_grid = lags_grid,
metric = custom_metric,
return_best = True,
verbose = False
)
# Backtesting test period
# ==============================================================================
metric, predictions = backtesting_forecaster(
forecaster = forecaster,
y = data['users'],
exog = data['is_closed'],
cv = cv,
metric = custom_metric,
verbose = False,
show_progress = True
)
metric
# Plot predictions vs real value
# ======================================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['users'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['pred'], name="prediction", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
title="Real value vs predicted in test data",
xaxis_title="Date time",
yaxis_title="Users",
width = 750,
height = 350,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1.1,
xanchor="left",
x=0.001
)
)
fig.show()
An improved set of hyperparameters has been discovered through grid search, resulting in a further reduction of forecast errors.
import session_info
session_info.show(html=False)
How to cite this document
If you use this document or any part of it, please acknowledge the source, thank you!
Intermittent demand forecasting with skforecast by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) at https://www.cienciadedatos.net/documentos/py48-intermittent-demand-forecasting.html
How to cite skforecast
If you use skforecast for a publication, we would appreciate it if you cite the published software.
Zenodo:
Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.14.0). Zenodo. https://doi.org/10.5281/zenodo.8382788
APA:
Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.14.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788
BibTeX:
@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.14.0}, month = {11}, year = {2024}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }
%%html
Did you like the article? Your support is important
Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊
This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.