If you like ** Skforecast **, please give us a star on
__GitHub__! ⭐️

- Time series forecasting with Python and Scikit-learn
- Forecasting electricity demand with Python
- Forecasting web traffic with machine learning and Python
- Forecasting time series with gradient boosting: Skforecast, XGBoost, LightGBM and CatBoost
- Bitcoin price prediction with Python, when the past does not repeat itself
- Probabilistic forecasting with machine learning
- Intermittent demand forecasting with skforecast

Tree-based models, including decision trees, random forests and gradient boosting machines (GBMs), are known for their effectiveness and widespread use in various machine learning applications. However, they have limitations when it comes to extrapolation, i.e., making predictions or estimates beyond the range of observed data. This limitation becomes particularly critical when forecasting time-series data with a trend. Because these models lack the ability to predict values beyond the observed range during training, their predicted values will deviate from the underlying trend.

Several strategies have been proposed to address this challenge, with one of the most frequently used techniques being differentiation. This process involves calculating the differences between successive observations in the time series. Rather than modeling the absolute values, the focus shifts to modeling the relative change ratios. After estimating the predictions, the transformation can be reversed to recover the values in their initial scale.

The **skforecast** library, version 0.10.0 or higher, introduces a novel `differentiation`

parameter within its forecaster classes to indicate that a differentiation process must be applied before training the model. This is achieved by making internal use of a new transformer named `skforecast.preprocessing.TimeSeriesDifferentiator`

. It should be noted that the differentiation process has been fully automated and its effects are reversed during the prediction phase, ensuring that the forecast values are in the same scale as the original time series data.

This document shows how differentiation can be used to model time series with a positive trend using tree-based models (random forest and a gradient boosting *xgboost*).

In [17]:

```
# Data manipulation
# ==============================================================================
import numpy as np
import pandas as pd
# Plots
# ==============================================================================
import matplotlib.pyplot as plt
plt.style.use('seaborn-v0_8-darkgrid')
# Modelling and Forecasting
# ==============================================================================
from xgboost import XGBRegressor
from sklearn.ensemble import RandomForestRegressor
from skforecast.ForecasterAutoreg import ForecasterAutoreg
from skforecast.model_selection import backtesting_forecaster
from skforecast.preprocessing import TimeSeriesDifferentiator
from sklearn.metrics import mean_absolute_error
```

The dataset consists of monthly totals of international air passengers from 1949 to 1960.

In [18]:

```
# Download data
# ==============================================================================
url = (
'https://raw.githubusercontent.com/JoaquinAmatRodrigo/Estadistica-machine-learning-python/'
'master/data/AirPassengers.csv'
)
data = pd.read_csv(url, sep=',')
# Data preprocessing
# ==============================================================================
data['Date'] = pd.to_datetime(data['Date'], format='%Y-%m')
data = data.set_index('Date')
data = data.asfreq('MS')
data = data['Passengers']
data = data.sort_index()
data.head(4)
```

Out[18]:

The same data is stored but applying a differentiation of order 1 using the `TimeSeriesDifferentiator`

.

In [19]:

```
# Data differentiated
# ==============================================================================
diferenciator = TimeSeriesDifferentiator(order=1)
data_diff = diferenciator.fit_transform(data)
data_diff = pd.Series(data_diff, index=data.index).dropna()
data_diff.head(4)
```

Out[19]:

In [20]:

```
# Data partition train-test
# ==============================================================================
end_train = '1955-12-01 23:59:59'
print(
f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()} "
f"(n={len(data.loc[:end_train])})")
print(
f"Test dates : {data.loc[end_train:].index.min()} --- {data.index.max()} "
f"(n={len(data.loc[end_train:])})")
# Plot
# ==============================================================================
fig, axs = plt.subplots(1, 2, figsize=(11, 2.5))
axs = axs.ravel()
data.loc[:end_train].plot(ax=axs[0], label='train')
data.loc[end_train:].plot(ax=axs[0], label='test')
axs[0].legend()
axs[0].set_title('Original data')
data_diff.loc[:end_train].plot(ax=axs[1], label='train')
data_diff.loc[end_train:].plot(ax=axs[1], label='test')
axs[1].legend()
axs[1].set_title('Differentiated data');
```

Two autoregressive forecasters are created, one with a scikit-learn `RandomForestRegressor`

and the other with an `XGBoost`

. Both are trained on data from 1949-01-01 to 1955-12-01 and produce forecasts for the next 60 months (5 years).

In [21]:

```
# Forecasting without differentiation
# ==============================================================================
steps = len(data.loc[end_train:])
# Forecasters
forecaster_rf = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=963),
lags = 12
)
forecaster_gb = ForecasterAutoreg(
regressor = XGBRegressor(random_state=963),
lags = 12
)
# Train
forecaster_rf.fit(data.loc[:end_train])
forecaster_gb.fit(data.loc[:end_train])
# Predict
predictions_rf = forecaster_rf.predict(steps=steps)
predictions_gb = forecaster_gb.predict(steps=steps)
# Error
error_rf = mean_absolute_error(data.loc[end_train:], predictions_rf)
error_gb = mean_absolute_error(data.loc[end_train:], predictions_gb)
print(f"Error (MAE) Random Forest: {error_rf:.2f}")
print(f"Error (MAE) Gradient Boosting: {error_gb:.2f}")
# Plot
fig, ax = plt.subplots(figsize=(7, 3), sharex=True, sharey=True)
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='test')
predictions_rf.plot(ax=ax, label='Random Forest')
predictions_gb.plot(ax=ax, label='Gradient Boosting')
ax.set_title(f'Forecasting without differentiation')
ax.set_xlabel('')
ax.legend();
```

The plot shows that none of the models is capable of accurately predicting the trend. After a few steps, the predictions become nearly constant, close to the maximum values observed in the training data.

Next, two new forecasters are trained using the same configuration, but with the argument `differentiation = 1`

. This activates the internal process of differencing (order 1) the time series before training the model, and reverses the differentiation (also known as integration) for the predicted values.

In [22]:

```
# Forecasting with differentiation
# ==============================================================================
steps = len(data.loc[end_train:])
# Forecasters
forecaster_rf = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=963),
lags = 12,
differentiation = 1
)
forecaster_gb = ForecasterAutoreg(
regressor = XGBRegressor(random_state=963),
lags = 12,
differentiation = 1
)
# Train
forecaster_rf.fit(data.loc[:end_train])
forecaster_gb.fit(data.loc[:end_train])
# Predict
predictions_rf = forecaster_rf.predict(steps=steps)
predictions_gb = forecaster_gb.predict(steps=steps)
# Error
error_rf = mean_absolute_error(data.loc[end_train:], predictions_rf)
error_gb = mean_absolute_error(data.loc[end_train:], predictions_gb)
print(f"Error (MAE) Random Forest: {error_rf:.2f}")
print(f"Error (MAE) Gradient Boosting: {error_gb:.2f}")
# Plot
fig, ax = plt.subplots(figsize=(7, 3), sharex=True, sharey=True)
data.loc[:end_train].plot(ax=ax, label='train')
data.loc[end_train:].plot(ax=ax, label='test')
predictions_rf.plot(ax=ax, label='Random Forest')
predictions_gb.plot(ax=ax, label='Gradient Boosting')
ax.set_title(f'Forecasting with differentiation')
ax.set_xlabel('')
ax.legend();
```

This time, both models are able to follow the trend in their predictions.

The previous example showed how easy it is to introduce differentiation into the forecasting process thanks to the functionalities available in **skforecast**. However, several non-trivial transformations have to be applied in order to achieve a smooth interaction.

In the next sections, the capabilities of the transformer `TimeSeriesDifferentiator`

are introduced:

Differentiation and integration (reverse differentiation) of any given time series.

Why managing the differentiation internally has advantages over the traditional approach of pre-transforming the entire time series before initiating the model training.

How to manage the differentiation when applying the Forecaster to new data that does not immediately follow the training data.

TimeSeriesDifferentiator is a custom transformer that follows the preprocessing sklearn API. This means it has the method `fit`

, `transform`

, `fit_transform`

and `inverse_transform`

.

In [23]:

```
# Differentiation with TimeSeriesDifferentiator
# ==============================================================================
y = np.array([5, 8, 12, 10, 14, 17, 21, 19], dtype=float)
diffenciator = TimeSeriesDifferentiator()
diffenciator.fit(y)
y_diff = diffenciator.transform(y)
print(f"Original time series : {y}")
print(f"Differenced time series: {y_diff}")
```

The process of differencing can be reversed (integration) using the `inverse_transform`

method.

In [24]:

```
# Inverse transform
# ==============================================================================
diffenciator.inverse_transform(y_diff)
```

Out[24]:

** ⚠ Warning**

`inverse_transform`

, is applicable only to the same time series that was previously differentiated using the same `TimeSeriesDifferentiator`

object. This limitation arises from the need to use the initial `fit`

method is executed.
** Note**

`inverse_transform_next_window`

is available in the `TimeSeriesDifferentiator`

. This method is designed to be used inside the Forecasters to reverse the differentiation of the predicted values. If the Forecaster regressor is trained with a differentiated time series, then the predicted values will be differentiated as well. The `inverse_transform_next_window`

method allows to return the predictions to the original scale, with the assumption that they start immediately after the last values observed (`last_window`

).
Forecasters manage the differentiation process internally, so there is no need for additional pre-processing of the time series and post-processing of the predictions. This has several advantages, but before diving in, the results of both approaches are compared.

In [25]:

```
# Time series differentiated by preprocessing before training
# ==============================================================================
diferenciator = TimeSeriesDifferentiator(order=1)
data_diff = diferenciator.fit_transform(data)
data_diff = pd.Series(data_diff, index=data.index).dropna()
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=963),
lags = 15
)
forecaster.fit(y=data_diff.loc[:end_train])
predictions_diff = forecaster.predict(steps=steps)
# Revert differentiation to obtain final predictions
last_value_train = data.loc[:end_train].iloc[[-1]]
predictions_1 = pd.concat([last_value_train, predictions_diff]).cumsum()[1:]
predictions_1 = predictions_1.asfreq('MS')
predictions_1.name = 'pred'
predictions_1.head(5)
```

Out[25]:

In [26]:

```
# Time series differentiated internally by the forecaster
# ==============================================================================
forecaster = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=963),
lags = 15,
differentiation = 1
)
forecaster.fit(y=data.loc[:end_train])
predictions_2 = forecaster.predict(steps=steps)
predictions_2.head(5)
```

Out[26]:

In [27]:

```
# Compare both predictions
# ==============================================================================
pd.testing.assert_series_equal(predictions_1, predictions_2)
```

Next, the outcomes of the backtesting process are subjected to a comparative analysis. This comparison is more complex than the previous one, as the process of undoing the differentiation must be performed separately for each backtesting fold.

In [28]:

```
# Backtesting with the time series differentiated by preprocessing before training
# ==============================================================================
steps = 5
forecaster_1 = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=963),
lags = 15
)
_, predictions_1 = backtesting_forecaster(
forecaster = forecaster_1,
y = data_diff,
steps = steps,
metric = 'mean_squared_error',
initial_train_size = len(data_diff.loc[:end_train]),
fixed_train_size = False,
gap = 0,
allow_incomplete_fold = True,
refit = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
# Revert differentiation of predictions. Predictions of each fold must be reverted
# individually. An id is added to each prediction to identify the fold to which it belongs.
predictions_1 = predictions_1.rename(columns={'pred': 'pred_diff'})
folds = len(predictions_1) / steps
folds = int(np.ceil(folds))
predictions_1['backtesting_fold_id'] = np.repeat(range(folds), steps)[:len(predictions_1)]
# Add the previously observed value of the time series (only to the first prediction of each fold)
previous_overved_values = data.shift(1).loc[predictions_1.index].iloc[::steps]
previous_overved_values.name = 'previous_overved_value'
predictions_1 = predictions_1.merge(
previous_overved_values,
left_index = True,
right_index = True,
how = 'left'
)
predictions_1 = predictions_1.fillna(0)
predictions_1['summed_value'] = (
predictions_1['pred_diff'] + predictions_1['previous_overved_value']
)
# Revert differentiation using the cumulative sum by fold
predictions_1['pred'] = (
predictions_1
.groupby('backtesting_fold_id')
.apply(lambda x: x['summed_value'].cumsum())
.to_numpy()
)
predictions_1.head(5)
```

Out[28]:

In [29]:

```
# Backtesting with the time series differentiated internally
# ==============================================================================
forecaster_2 = ForecasterAutoreg(
regressor = RandomForestRegressor(random_state=963),
lags = 15,
differentiation = 1
)
_, predictions_2 = backtesting_forecaster(
forecaster = forecaster_2,
y = data,
steps = steps,
metric = 'mean_squared_error',
initial_train_size = len(data.loc[:end_train]),
fixed_train_size = False,
gap = 0,
allow_incomplete_fold = True,
refit = True,
n_jobs = 'auto',
verbose = False,
show_progress = True
)
predictions_2.head(5)
```

Out[29]:

In [30]:

```
# Compare both predictions
# ==============================================================================
pd.testing.assert_series_equal(predictions_1['pred'], predictions_2['pred'])
```

If, as demonstrated, the values are equivalent when differentiating the time series in a preprocessing step or when allowing the Forecaster to manage the differentiation internally, why the second alternative is better?

Allowing the forecaster to manage all transformations internally guarantees that the same transformations are applied when the model is run on new data.

When the model is applied to new data that does not follow immediately after the training data (for example, if a model is not retrained for each prediction phase), the forecaster automatically increases the size of the last window needed to generate the predictors, as well as applying the differentiation to the incoming data and undoing it in the final predictions.

These transformations are non-trivial and very error-prone, so **skforecast** tries to avoid overcomplicating the already challenging task of forecasting time series.

In [31]:

```
import session_info
session_info.show(html=False)
```

**How to cite this document?**

Modelling time series trend with tree based models by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py49-modelling-time-series-trend-with-tree-based-models.html

**Did you like the article? Your support is important**

Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊

This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.