Intermittent demand forecasting with skforecast

If you like  Skforecast ,  help us giving a star on   GitHub! ⭐️

Intermittent demand forecasting with skforecast

Joaquín Amat Rodrigo, Javier Escobar Ortiz
April, 2023 (last update August 2024)

Introduction

Intermittent demand forecasting is a statistical method used to predict the demand for products that have sporadic or irregular sales patterns. These types of products are characterized by periods of high demand followed by periods of little or no demand. Intermittent demand forecasting is used in the manufacturing, retail, and healthcare industries to manage inventory levels, optimize production schedules, and reduce out-of-stocks and excess inventory costs.

Regular intermittent demand refers to predictable demand patterns with known intervals between demand periods. This type of intermittent demand occurs with a degree of regularity, such as a product that has seasonal demand or a product that is ordered every month on a certain date. Forecasting of regular intermittent demand can be done using statistical methods, such as Croston's method, which is a common technique used to forecast demand in these situations. However, these methods often do not take into account important exogenous variables that can provide valuable information about demand intervals. As a result, machine learning-based forecasting can be an appealing alternative as it can incorporate these variables and improve accuracy.

On the other hand, irregular intermittent demand refers to unpredictable demand patterns, with no known intervals between demand periods. This type of intermittent demand is random and unpredictable, such as a product that is ordered only a few times a year and in varying quantities. Forecasting irregular intermittent demand is challenging because there is no predictable pattern to the demand. Traditional forecasting methods may not work effectively in these situations and alternative forecasting methods, such as bootstrapping or simulation, may be required.

In summary, regular intermittent demand has a predictable pattern whereas irregular intermittent demand does not. Forecasting regular intermittent demand is easier than forecasting irregular intermittent demand due to the predictability of the demand pattern.

This document demonstrates how the Python library skforecast can be used to forecast regular intermittent demand scenarios. By using this library, the machine learning model can focus on learning to predict demand during periods of interest, while avoiding the influence of periods when there is no demand.

Libraries

In [1]:
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)

# Modeling
# ==============================================================================
import skforecast
import sklearn
import lightgbm
from lightgbm import LGBMRegressor
from sklearn.metrics import mean_absolute_error
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster

# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

color = '\033[1m\033[38;5;208m' 
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")
Version skforecast: 0.13.0
Version scikit-learn: 1.5.1
Version pandas: 2.2.2
Version numpy: 2.0.1

Data

The data used in this example represents the number of users who visited a store during its opening hours from Monday to Friday, between 7:00 and 20:00. Therefore, any predictions outside this period are not useful and can either be ignored or set to 0.

In [2]:
# Downloading data
# ======================================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine'
       '-learning-python/master/data/intermittent_demand.csv')
data = pd.read_csv(url, sep=',')
data['date_time'] = pd.to_datetime(data['date_time'], format='%Y-%m-%d %H:%M:%S')
data = data.set_index('date_time')
data = data.asfreq('H')
data = data.sort_index()
data.head(3)
/tmp/ipykernel_25818/2933876167.py:8: FutureWarning:

'H' is deprecated and will be removed in a future version, please use 'h' instead.

Out[2]:
users
date_time
2011-01-01 00:00:00 0.0
2011-01-01 01:00:00 0.0
2011-01-01 02:00:00 0.0
In [3]:
# Split train-val-test
# ======================================================================================
end_train = '2012-03-31 23:59:00'
end_validation = '2012-08-31 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validation : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")
Dates train      : 2011-01-01 00:00:00 --- 2012-03-31 23:00:00  (n=10944)
Dates validation : 2012-04-01 00:00:00 --- 2012-08-31 23:00:00  (n=3672)
Dates test       : 2012-09-01 00:00:00 --- 2012-12-31 23:00:00  (n=2928)
In [4]:
# Plot time series
# ======================================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_train.index, y=data_train['users'], name="train", mode="lines")
trace2 = go.Scatter(x=data_val.index, y=data_val['users'], name="validation", mode="lines")
trace3 = go.Scatter(x=data_test.index, y=data_test['users'], name="test", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.add_trace(trace3)
fig.update_layout(
    title="Time series of users",
    xaxis_title="Date time",
    yaxis_title="Users",
    width  = 800,
    height = 400,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1,
        xanchor="left",
        x=0.001
    )
)
fig.show()
In [5]:
# Boxplot for weekly seasonality
# ==============================================================================
data['week_day'] = data.index.day_of_week + 1
fig = px.box(
        data,
        x="week_day",
        y="users",
        title = 'Distribution of users per day of week',
        width=600,
        height=300
     )
median_values = data.groupby('week_day')['users'].median()
fig.add_trace(
    go.Scatter(
        x=median_values.index,
        y=median_values.values,
        mode='lines+markers',
        line=dict(color='blue', dash='dash'),
        showlegend=False
    )
)
fig.update_layout(margin=dict(l=20, r=20, t=35, b=20))
fig.show()
In [6]:
# Boxplot for daily seasonality
# ==============================================================================
data['hour_day'] = data.index.hour + 1
fig = px.box(
        data,
        x="hour_day",
        y="users",
        title = 'Distribution of users per hour of day',
        width=600,
        height=300
     )
median_values = data.groupby('hour_day')['users'].median()
fig.add_trace(
    go.Scatter(
        x=median_values.index,
        y=median_values.values,
        mode='lines+markers',
        line=dict(color='blue', dash='dash'),
        showlegend=False
    )
)
fig.update_layout(margin=dict(l=20, r=20, t=35, b=20))
fig.show()