Intermittent demand forecasting with skforecast

Intermittent demand forecasting with skforecast

Joaquín Amat Rodrigo, Javier Escobar Ortiz
April, 2023

Introduction


Intermittent demand forecasting is a statistical method used to predict the demand for products that have sporadic or irregular sales patterns. These types of products are characterized by periods of high demand followed by periods of little or no demand. Intermittent demand forecasting is used in the manufacturing, retail, and healthcare industries to manage inventory levels, optimize production schedules, and reduce out-of-stocks and excess inventory costs.

Regular intermittent demand refers to predictable demand patterns with known intervals between demand periods. This type of intermittent demand occurs with a degree of regularity, such as a product that has seasonal demand or a product that is ordered every month on a certain date. Forecasting of regular intermittent demand can be done using statistical methods, such as Croston's method, which is a common technique used to forecast demand in these situations. However, these methods often do not take into account important exogenous variables that can provide valuable information about demand intervals. As a result, machine learning-based forecasting can be an appealing alternative as it can incorporate these variables and improve accuracy.

On the other hand, irregular intermittent demand refers to unpredictable demand patterns, with no known intervals between demand periods. This type of intermittent demand is random and unpredictable, such as a product that is ordered only a few times a year and in varying quantities. Forecasting irregular intermittent demand is challenging because there is no predictable pattern to the demand. Traditional forecasting methods may not work effectively in these situations and alternative forecasting methods, such as bootstrapping or simulation, may be required.

In summary, regular intermittent demand has a predictable pattern whereas irregular intermittent demand does not. Forecasting regular intermittent demand is easier than forecasting irregular intermittent demand due to the predictability of the demand pattern.

This document demonstrates how the Python library skforecast can be used to forecast regular intermittent demand scenarios. By using this library, the machine learning model can focus on learning to predict demand during periods of interest, while avoiding the influence of periods when there is no demand.

Libraries

In [1]:
# Data processing
# ==============================================================================
import numpy as np
import pandas as pd

# Plots
# ==============================================================================
import matplotlib.pyplot as plt
import plotly.graph_objects as go
import plotly.express as px
import plotly.io as pio
pio.templates.default = "seaborn"

# Modeling
# ==============================================================================
from lightgbm import LGBMRegressor
from sklearn.preprocessing import FunctionTransformer
from sklearn.metrics import mean_absolute_error
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.ForecasterAutoregCustom import ForecasterAutoregCustom
from skforecast.model_selection import grid_search_forecaster
from skforecast.model_selection import backtesting_forecaster

Data


The data used in this example represents the number of users who visited a store during its opening hours from Monday to Friday, between 7:00 and 20:00. Therefore, any predictions outside this period are not useful and can either be ignored or set to 0.

In [2]:
# Downloading data
# ======================================================================================
url = ('https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine'
       '-learning-python/master/data/intermittent_demand.csv')
data = pd.read_csv(url, sep=',')
data['date_time'] = pd.to_datetime(data['date_time'], format='%Y-%m-%d %H:%M:%S')
data = data.set_index('date_time')
data = data.asfreq('H')
data = data.sort_index()
data.head(3)
Out[2]:
users
date_time
2011-01-01 00:00:00 0.0
2011-01-01 01:00:00 0.0
2011-01-01 02:00:00 0.0
In [3]:
# Split train-val-test
# ======================================================================================
end_train = '2012-03-31 23:59:00'
end_validation = '2012-08-31 23:59:00'
data_train = data.loc[: end_train, :]
data_val   = data.loc[end_train:end_validation, :]
data_test  = data.loc[end_validation:, :]

print(f"Dates train      : {data_train.index.min()} --- {data_train.index.max()}  (n={len(data_train)})")
print(f"Dates validation : {data_val.index.min()} --- {data_val.index.max()}  (n={len(data_val)})")
print(f"Dates test       : {data_test.index.min()} --- {data_test.index.max()}  (n={len(data_test)})")
Dates train      : 2011-01-01 00:00:00 --- 2012-03-31 23:00:00  (n=10944)
Dates validation : 2012-04-01 00:00:00 --- 2012-08-31 23:00:00  (n=3672)
Dates test       : 2012-09-01 00:00:00 --- 2012-12-31 23:00:00  (n=2928)
In [4]:
# Plot time series
# ======================================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_train.index, y=data_train['users'], name="train", mode="lines")
trace2 = go.Scatter(x=data_val.index, y=data_val['users'], name="validation", mode="lines")
trace3 = go.Scatter(x=data_test.index, y=data_test['users'], name="test", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.add_trace(trace3)
fig.update_layout(
    title="Time series of users",
    xaxis_title="Date time",
    yaxis_title="Users",
    width  = 800,
    height = 400,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1,
        xanchor="left",
        x=0.001
    )
)
fig.show()