If you like Skforecast , please give us a star on GitHub! ⭐️
More about forecasting
Deep Learning is a field of artificial intelligence focused on creating models based on neural networks that allow learning non-linear representations. Recurrent neural networks (RNN) are a type of deep learning architecture designed to work with sequential data, where information is propagated through recurrent connections, allowing the network to learn temporal dependencies.
This article describes how to train recurrent neural network models -specifically RNN and LSTM- for time series prediction (forecasting) using Python, Keras and Skforecast.
Keras3 provides a friendly interface to build and train neural network models. Thanks to its high-level API, developers can easily implement LSTM architectures, taking advantage of the computational efficiency and scalability offered by deep learning.
Skforecast eases the implementation and use of machine learning models -including LSTMs and RNNs- to forecasting problems. Using this package, the user can define the problem and abstract from the architecture. For advanced users, skforecast also allows to execute a previously defined deep learning architecture.
✎ Nota
To fully understand this article, some knowledge about neural networks and deep learning is presupposed. However, if this is not the case, and while we work on creating new material, we provide you with some reference links to start:
Recurrent Neural Networks (RNN) are a type of neural networks designed to process data that follows a sequential order. In conventional neural networks, such as feedforward networks, information flows in one direction, from input to output through hidden layers, without considering the sequential structure of the data. In contrast, RNNs maintain internal states or memories, which allow them to remember past information and use it to predict future data in the sequence.
The basic unit of an RNN is the recurrent cell. This cell takes two inputs: the current input and the previous hidden state. The hidden state can be understood as a "memory" that retains information from previous iterations. The current input and the previous hidden state are combined to calculate the current output and the new hidden state. This output is used as input for the next iteration, along with the next input in the data sequence.
Despite the advances that have been achieved with RNN architectures, they have limitations to capture long-term patterns. This is why variants such as LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Units) have been developed, which address these problems and allow long-term information to be retained more effectively.
Long Short-Term Memory (LSTM) neural networks are a specialized type of RNNs designed to overcome the limitations associated with capturing long-term temporal dependencies. Unlike traditional RNNs, LSTMs incorporate a more complex architecture, introducing memory units and gate mechanisms to improve information management over time.
Structure of LSTMs
LSTMs have a modular structure consisting of three fundamental gates: the forget gate, the input gate, and the output gate. These gates work together to regulate the flow of information through the memory unit, allowing for more precise control over what information to retain and what to forget.
Forget Gate: Regulates how much information should be forgotten and how much should be retained, combining the current input and the previous output through a sigmoid function.
Input Gate: Decides how much new information should be added to long-term memory.
Output Gate: Determines how much information from the current memory will be used for the final output, combining the current input and memory information through a sigmoid function.
The complexity of a time series problem is usually defined by three key factors: first, deciding which time series or series to use to train the model; second, determining what or how many time series are to be predicted; and third, defining the number of steps into the future that you want to predict. These three aspects can be a real challenge when addressing time series problems.
Recurrent neural networks, thanks to their wide variety of architectures, allow modeling the following scenarios:
Problems 1:1 - Model a single series and predict that same series (single-series, single-output)
Problems N:1 - Model a single series using multiple series (multi-series, single-output)
Problems N:M - Model multiple series using multiple series (multi-series, multiple-outputs)
In all these scenarios, the prediction can be made single-step forecasting (one step into the future) or multi-step forecasting (multiple steps into the future). In the first case, the model only predicts a single value, while in the second, the model predicts multiple values into the future.
In some situations, it can be difficult to define and create the appropriate Deep Learning architecture to address a specific problem. The skforecast library provides functionalities that allow determining the appropriate Tensorflow architecture for each problem, simplifying and accelerating the modeling process for a wide variety of problems. Below is an example of how to use skforecast to solve each of the described time series problems using recurrent neural networks.
The data used in this article contains detailed information on air quality in the city of Valencia (Spain). The data collection spans from January 1, 2019 to December 31, 2021, providing hourly measurements of various air pollutants, such as PM2.5 and PM10 particles, carbon monoxide (CO), nitrogen dioxide (NO2), among others. The data has been obtained from the Red de Vigilancia y Control de la Contaminación Atmosférica, 46250054-València - Centre, https://mediambient.gva.es/es/web/calidad-ambiental/datos-historicos platform.
💡 Tip: Configuring your backend
As of Skforecast version0.13.0
, PyTorch backend support is available. You can configure the backend by exporting the KERAS_BACKEND
environment variable or by editing your local configuration file at ~/.keras/keras.json
. The available backend options are: "tensorflow" and "torch". Example:
```python
import os
os.environ["KERAS_BACKEND"] = "torch"
import keras
```
⚠ Warning
The backend must be configured before importing Keras, and the backend cannot be changed after the package has been imported.# Data processing
# ==============================================================================
import os
import pandas as pd
import numpy as np
from skforecast.datasets import fetch_dataset
# Plotting
# ==============================================================================
import matplotlib.pyplot as plt
from skforecast.plot import set_dark_theme
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)
# Keras
# ==============================================================================
os.environ["KERAS_BACKEND"] = "tensorflow" # 'tensorflow', 'jax´ or 'torch'
import keras
from keras.optimizers import Adam
from keras.losses import MeanSquaredError
from keras.callbacks import EarlyStopping
if keras.__version__ > "3.0":
if keras.backend.backend() == "tensorflow":
import tensorflow
elif keras.backend.backend() == "torch":
import torch
else:
print("Backend not recognized. Please use 'tensorflow' or 'torch'.")
# Time series modeling
# ==============================================================================
import skforecast
from skforecast.deep_learning import ForecasterRnn
from skforecast.deep_learning.utils import create_and_compile_model
from sklearn.preprocessing import MinMaxScaler
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster_multiseries
# Warning configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version Keras: {keras.__version__}")
print(f"{color}Using backend: {keras.backend.backend()}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")
if keras.__version__ > "3.0":
if keras.backend.backend() == "tensorflow":
print(f"{color}Version tensorflow: {tensorflow.__version__}")
elif keras.backend.backend() == "torch":
print(f"{color}Version torch: {torch.__version__}")
else:
print(f"{color}Version torch: {jax.__version__}")
⚠ Warning
At the time of writing this document,tensorflow
is only compatible with numpy
versions lower than 2.0. If you have a higher version, you can downgrade it by running the following command: pip install numpy==1.26.4
# Downloading the dataset and processing it
# ==============================================================================
air_quality = fetch_dataset(name="air_quality_valencia_no_missing")
air_quality.head()
It is verified that the data set has an index of type DatetimeIndex
with hourly frequency. Although it is not necessary for the data to have this type of index to use skforecast, it is more advantageous for the subsequent use of the predictions.
# Checking the frequency of the time series
# ==============================================================================
print(f"Index: {air_quality.index.dtype}")
print(f"Frequency: {air_quality.index.freq}")
To facilitate the training of the models, the search for optimal hyperparameters, and the evaluation of their predictive capacity, the data is divided into three separate sets: training, validation, and test.
# Split train-validation-test
# ==============================================================================
air_quality = air_quality.loc[:'2021-12-31 23:00:00', :].copy()
end_train = "2021-03-31 23:59:00"
end_validation = "2021-09-30 23:59:00"
air_quality_train = air_quality.loc[:end_train, :].copy()
air_quality_val = air_quality.loc[end_train:end_validation, :].copy()
air_quality_test = air_quality.loc[end_validation:, :].copy()
print(
f"Dates train : {air_quality_train.index.min()} --- "
f"{air_quality_train.index.max()} (n={len(air_quality_train)})"
)
print(
f"Dates validation : {air_quality_val.index.min()} --- "
f"{air_quality_val.index.max()} (n={len(air_quality_val)})"
)
print(
f"Dates test : {air_quality_test.index.min()} --- "
f"{air_quality_test.index.max()} (n={len(air_quality_test)})"
)
# Plotting pm2.5
# ==============================================================================
set_dark_theme()
fig, ax = plt.subplots(figsize=(7, 3))
air_quality_train["pm2.5"].rolling(100).mean().plot(ax=ax, label="train")
air_quality_val["pm2.5"].rolling(100).mean().plot(ax=ax, label="validation")
air_quality_test["pm2.5"].rolling(100).mean().plot(ax=ax, label="test")
ax.set_title("pm2.5")
ax.legend();
Although tensorflow-keras facilitates the process of creating deep learning architectures, it is not always trivial to determine the dimensions that an LSTM model should have for forecasting, as these depend on how many time series are being modeled, how many are being predicted, and the length of the prediction horizon.
To improve the user experience and speed up the prototyping, development, and production process, skforecast has the create_and_compile_model
function, with which, by indicating just a few arguments, the architecture is inferred and the model is created.
series
: Time series to be used to train the model
levels
: Time series to be predicted.
lags
: Number of time steps to be used to predict the next value.steps
: Number of time steps to be predicted.recurrent_layer
: Type of recurrent layer to use. By default, an LSTM layer is used.recurrent_units
: Number of units in the recurrent layer. By default, 100 is used. If a list is passed, a recurrent layer will be created for each element in the list.dense_units
: Number of units in the dense layer. By default, 64 is used. If a list is passed, a dense layer will be created for each element in the list.optimizer
: Optimizer to use. By default, Adam with a learning rate of 0.01 is used.loss
: Loss function to use. By default, Mean Squared Error is used.✎ Note
The `create_and_compile_model` function is designed to facilitate the creation of the Tensorflow model, however, more advanced users can create their own architectures as long as the input and output dimensions match the use case to which the model will be applied.Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).
In this first scenario, we want to predict the concentration of $O_3$ for the next 1 and 5 days using only its own historical data. It is therefore a scenario in which a single time series is modeled using only its past values. This problem is also called autoregressive prediction.
First, a single-step forecast is made. To do this, a model is created using the create_and_compile_model
function, which is passed as an argument to the ForecasterRnn
class.
This is the simplest example of forecasting with recurrent neural networks. The model only needs a time series to train and predict. Therefore, the series
argument of the create_and_compile_model
function only needs a time series, the same one that is to be predicted (levels
). In addition, since only a single value is to be predicted in the future, the steps
argument is equal to 1.
# Create model
# ==============================================================================
series = ["o3"] # Series used as predictors
levels = ["o3"] # Target serie to predict
lags = 32 # Past time steps to be used to predict the target
steps = 1 # Future time steps to be predicted
data = air_quality[series].copy()
data_train = air_quality_train[series].copy()
data_val = air_quality_val[series].copy()
data_test = air_quality_test[series].copy()
model = create_and_compile_model(
series=data_train,
levels=levels,
lags=lags,
steps=steps,
recurrent_layer="LSTM",
recurrent_units=4,
dense_units=16,
optimizer=Adam(learning_rate=0.01),
loss=MeanSquaredError()
)
model.summary()
In this case, a simple LSTM network is used, with a single recurrent layer with 4 neurons and a hidden dense layer with 16 neurons. The following table shows a detailed description of each layer:
Layer | Type | Output Shape | Parameters | Description |
---|---|---|---|---|
Input Layer (InputLayer) | InputLayer |
(None, 32, 1) |
0 | This is the input layer of the model. It receives sequences of length 32, corresponding to the number of lags with a dimension at each time step. |
LSTM Layer (Long Short-Term Memory) | LSTM |
(None, 4) |
96 | The LSTM layer is a long and short-term memory layer that processes the input sequence. It has 4 LSTM units and connects to the next layer. |
First Dense Layer (Dense) | Dense |
(None, 16) |
80 | This is a fully connected layer with 16 units and uses a default activation function (relu) in the provided architecture. |
Second Dense Layer (Dense) | Dense |
(None, 1) |
17 | Another fully connected dense layer, this time with a single output unit. It also uses a default activation function. |
Reshape Layer (Reshape) | Reshape |
(None, 1, 1) |
0 | This layer reshapes the output of the previous dense layer to have a specific shape (None, 1, 1) . This layer is not strictly necessary, but is included to make the module generalizable to other multi-output forecasting problems. The dimension of this output layer is (None, steps_to_predict_future, series_to_predict) . In this case, steps=1 and levels="o3" , so the dimension is (None, 1, 1) |
Total Parameters and Trainable | - | - | 193 | Total Parameters: 193, Trainable Parameters: 193, Non-Trainable Parameters: 0 |
Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).
The forecaster is created from the model and the validation data is passed to it so that it can evaluate the model at each epoch. In addition, a MinMaxScaler
object is passed to it to standardize the input and output data. This object will be responsible for scaling the input data and the predictions to their original scale.
The fit_kwargs
are the arguments passed to the fit
method of the model. In this case, the number of epochs, the batch size, the validation data, and a callback to stop training when the validation loss stops decreasing are passed to it.
# Forecaster Creation
# ==============================================================================
forecaster = ForecasterRnn(
regressor=model,
levels=levels,
transformer_series=MinMaxScaler(),
fit_kwargs={
"epochs": 10, # Number of epochs to train the model.
"batch_size": 32, # Batch size to train the model.
"callbacks": [
EarlyStopping(monitor="val_loss", patience=5)
], # Callback to stop training when it is no longer learning.
"series_val": data_val, # Validation data for model training.
},
)
forecaster
⚠ Warning
The warning indicates that the number of lags has been inferred from the model architecture. In this case, the model has an LSTM layer with 32 neurons, so the number of lags is 32. If a different number of lags is desired, the `lags` argument can be specified in the `create_and_compile_model` function. To omit the warning, the `lags=lags` and `steps=steps` arguments can be specified in the initialization of the `ForecasterRnn`.# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
# Track training and overfitting
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)
In deep learning models, it is very important to control overfitting. To do this, a Keras callback is used to stop training when the value of the cost function, on the validation data, stops decreasing. In this case, the callback does not stop training, as we have only trained for 10 epochs. If the number of epochs is increased, the callback will stop training when the validation loss stops decreasing.
On the other hand, another very useful tool is the plotting of the training and validation loss at each epoch. This allows you to visualize the behavior of the model and detect possible overfitting problems.
In the case of our model, it is observed that the training loss decreases rapidly in the first epoch, while the validation loss is low from the first epoch. From this it can be deduced:
The model is not overfitting, as the validation loss is similar to the training loss.
The validation error is calculated once the model is trained, so the first value of the validation loss in the first epoch is similar to the training loss in the second epoch.
Once the forecaster has been trained, predictions can be obtained. In this case, it is a single value since only one step into the future (step
) has been specified.
# Predictions
# ==============================================================================
predictions = forecaster.predict()
predictions
To obtain a robust estimate of the predictive capacity of the model, a backtesting process is performed. The backtesting process consists of generating a prediction for each observation in the test set, following the same procedure that would be followed if the model were in production, and finally comparing the predicted value with the actual value.
# Backtesting with test data
# ==============================================================================
cv = TimeSeriesFold(
steps=forecaster.max_step,
initial_train_size=len(data.loc[:end_validation, :]), # Training + Validation Data
refit=False,
)
metrics, predictions = backtesting_forecaster_multiseries(
forecaster=forecaster,
series=data,
levels=forecaster.levels,
cv=cv,
metric="mean_absolute_error",
verbose=False, # Set to True for detailed information
)
# Backtesting predictions
# ==============================================================================
predictions
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
title="Prediction vs real values in the test set",
xaxis_title="Date time",
yaxis_title="O3",
width=750,
height=350,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1.05,
xanchor="left",
x=0
)
)
fig.show()
# Backtesting metrics
# ==============================================================================
metrics
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
The next case is to predict the next 5 values of O3 using only its historical data. It is therefore a scenario in which multiple future steps of a single time series are modeled using only its past values.
A similar architecture to the previous one will be used, but with a greater number of neurons in the LSTM layer and in the first dense layer. This will allow the model to have greater flexibility to model the time series.
# Model creation
# ==============================================================================
series = ["o3"] # Series used as predictors
levels = ["o3"] # Target serie to predict
lags = 32 # Past time steps to be used to predict the target
steps = 5 # Future time steps to be predicted
model = create_and_compile_model(
series=data_train,
levels=levels,
lags=lags,
steps=steps,
recurrent_layer="LSTM",
recurrent_units=50,
dense_units=32,
optimizer=Adam(learning_rate=0.01),
loss=MeanSquaredError()
)
model.summary()
# Forecaster Creation
# ==============================================================================
forecaster = ForecasterRnn(
regressor=model,
levels=levels,
transformer_series=MinMaxScaler(),
fit_kwargs={
"epochs": 10, # Number of epochs to train the model.
"batch_size": 32, # Batch size to train the model.
"callbacks": [
EarlyStopping(monitor="val_loss", patience=5)
], # Callback to stop training when it is no longer learning.
"series_val": data_val, # Validation data for model training.
},
)
forecaster
✎ Note
The `fit_kwargs` parameter is very useful as it allows you to set any configuration in the model, in this case Keras. In the previous code, the number of training epochs (10) is defined with a batch size of 32. An `EarlyStopping` callback is configured to stop training when the validation loss stops decreasing for 5 epochs (`patience=5`). Other callbacks can also be configured, such as `ModelCheckpoint` to save the model at each epoch, or even Tensorboard to visualize the training and validation loss in real time.# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
# Train and overfitting tracking
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)
It is anticipated that the prediction will be of lower quality than in the previous case, as the error observed in the different epochs is higher. This has a simple explanation, and that is that the model has to predict 5 values instead of 1. Therefore, the validation error is higher since the loss of 5 values is being calculated instead of 1.
The prediction is made. In this case, there are 5 values since 5 steps into the future (step
) have been specified.
# Prediction
# ==============================================================================
predictions = forecaster.predict()
predictions
Specific steps
can be predicted, as long as they are within the prediction horizon defined in the model.
# Specific step predictions
# ==============================================================================
predictions = forecaster.predict(steps=[1, 3])
predictions
# Backtesting
# ==============================================================================
cv = TimeSeriesFold(
steps=forecaster.max_step,
initial_train_size=len(data.loc[:end_validation, :]),
refit=False,
)
metrics, predictions = backtesting_forecaster_multiseries(
forecaster=forecaster,
series=data,
levels=forecaster.levels,
cv=cv,
metric="mean_absolute_error",
verbose=False,
)
# Backtesting predictions
# ==============================================================================
predictions
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
title="Prediction vs real values in the test set",
xaxis_title="Date time",
yaxis_title="O3",
width=750,
height=350,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1.05,
xanchor="left",
x=0
)
)
fig.show()
# Backtesting metrics
# ==============================================================================
metrics
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
In this case, the prediction is worse than in the previous case. This is to be expected since the model has to predict 5 values instead of 1.
In this case, the same series will be predicted, but using multiple time series as predictors. It is therefore a scenario in which past values of multiple time series are used to predict a single time series.
These types of approaches are very useful when multiple time series related to each other are available. For example, in the case of temperature prediction, multiple time series such as humidity, atmospheric pressure, wind speed, etc. can be used.
In this type of problem, the architecture of the neural network is more complex, an additional recurrent dense layer is needed to process the multiple input series. In addition, another hidden dense layer is added to process the output of the recurrent layer. As can be seen, creating the model using skforecast
is very simple, simply pass a list of integers to the recurrent_units
and dense_units
arguments to create multiple recurrent and dense layers.
# Creación del modelo
# ==============================================================================
# Time series used in the training. Now, it is multiseries
series = ['pm2.5', 'co', 'no', 'no2', 'pm10', 'nox', 'o3', 'veloc.', 'direc.','so2']
levels = ["o3"]
lags = 32
steps = 5
data = air_quality[series].copy()
data_train = air_quality_train[series].copy()
data_val = air_quality_val[series].copy()
data_test = air_quality_test[series].copy()
model = create_and_compile_model(
series=data_train,
levels=levels,
lags=lags,
steps=steps,
recurrent_layer="LSTM",
recurrent_units=[100, 50],
dense_units=[64, 32],
optimizer=Adam(learning_rate=0.01),
loss=MeanSquaredError()
)
model.summary()
Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).
# Forecaster creation
# ==============================================================================
forecaster = ForecasterRnn(
regressor=model,
levels=levels,
steps=steps,
lags=lags,
transformer_series=MinMaxScaler(),
fit_kwargs={
"epochs": 4,
"batch_size": 128,
"series_val": data_val,
},
)
forecaster
# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
# Trainig and overfitting tracking
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)
# Prediction
# ==============================================================================
predictions = forecaster.predict()
predictions
# Backtesting with test data
# ==============================================================================
cv = TimeSeriesFold(
steps=forecaster.max_step,
initial_train_size=len(data.loc[:end_validation, :]),
refit=False,
)
metrics, predictions = backtesting_forecaster_multiseries(
forecaster=forecaster,
series=data,
levels=forecaster.levels,
cv=cv,
metric="mean_absolute_error",
verbose=False,
)
# Backtesting metrics
# ==============================================================================
metrics
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
# Backtesting predictions
# ==============================================================================
predictions
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
title="Prediction vs real values in the test set",
xaxis_title="Date time",
yaxis_title="O3",
width=750,
height=350,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1.05,
xanchor="left",
x=0
)
)
fig.show()
When using multiple time series as predictors, it would be expected that the model would be able to predict the target series better. However, in this case, the predictions are worse than in the previous case where only a time series was used as a predictor. This may be because the time series used as predictors are not related to the target series. Therefore, the model is not able to learn any relationship between them.
In the next and last scenario, multiple time series are predicted using multiple time series as predictors. It is therefore a scenario in which multiple series are modeled simultaneously using a single model. This has special application in many real scenarios, such as the prediction of stock values for several companies based on the stock history, the price of energy and commodities. Or the case of forecasting multiple products in an online store, based on the sales of other products, the price of the products, etc.
# Model creation
# ==============================================================================
# Now, we have multiple series and multiple targets
series = ['pm2.5', 'co', 'no', 'no2', 'pm10', 'nox', 'o3', 'veloc.', 'direc.', 'so2']
levels = ['pm2.5', 'co', 'no', "o3"] # Features to predict. It can be all the series or less
lags = 32
steps = 5
data = air_quality[series].copy()
data_train = air_quality_train[series].copy()
data_val = air_quality_val[series].copy()
data_test = air_quality_test[series].copy()
model = create_and_compile_model(
series=data_train,
levels=levels,
lags=lags,
steps=steps,
recurrent_layer="LSTM",
recurrent_units=[100, 50],
dense_units=[64, 32],
optimizer=Adam(learning_rate=0.01),
loss=MeanSquaredError()
)
model.summary()
# Forecaster creation
# ==============================================================================
forecaster = ForecasterRnn(
regressor=model,
levels=levels,
steps=steps,
lags=lags,
transformer_series=MinMaxScaler(),
fit_kwargs={
"epochs": 100,
"batch_size": 128,
"callbacks": [
EarlyStopping(monitor="val_loss", patience=5)
],
"series_val": data_val,
},
)
forecaster
The model is trained for 100 epochs with an EarlyStopping
callback that stops training when the validation loss stops decreasing for 5 epochs (patience=5
).
⚠ Warning
Training the model takes approximately 3 minutes on a computer with 8 cores, and the `EarlyStopping` stops training at epoch 12. These results may vary depending on the hardware used.# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
# Trainig and overfitting tracking
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)
There are slight signs of overfitting from epoch 8. This may be because the model is too complex for the problem being solved. Thanks to the Keras callback, the model stops at epoch 12, thus avoiding the model from overfitting further. A good practice would be to modify the architecture of the model to avoid overfitting. For example, the number of neurons in the recurrent and dense layers could be reduced, or a dropout layer could be added to regularize the model.
When the forecaster models multiple time series, by default, predictions are calculated for all of them. However, it is possible to specify the series for which predictions are to be made using the predict
method.
# Prediction
# ==============================================================================
predictions = forecaster.predict()
predictions
The prediction can also be made for specific steps
, as long as they are within the prediction horizon defined in the model.
# Specific step predictions
# ==============================================================================
forecaster.predict(steps=[1, 5], levels="o3")
# Backtesting with test data
# ==============================================================================
cv = TimeSeriesFold(
steps=forecaster.max_step,
initial_train_size=len(data.loc[:end_validation, :]),
refit=False,
)
metrics, predictions = backtesting_forecaster_multiseries(
forecaster=forecaster,
series=data,
levels=forecaster.levels,
cv=cv,
metric="mean_absolute_error",
verbose=False,
)
# Backtesting metrics
# ==============================================================================
metrics
# Plotting predictions vs real values in the test set
# =============================================================================
fig = px.line(
data_frame = pd.concat([
predictions.melt(ignore_index=False).assign(group="predicciones"),
data_test[predictions.columns].melt(ignore_index=False).assign(group="test")
]).reset_index().rename(columns={"index": "date_time"}),
x="date_time",
y="value",
facet_row="variable",
color="group",
title="Predictions vs real values in the test set",
)
fig.update_layout(
title="Prediction vs real values in the test set",
width=750,
height=850,
margin=dict(l=20, r=20, t=35, b=20),
legend=dict(
orientation="h",
yanchor="top",
y=1,
xanchor="left",
x=0
)
)
fig.update_yaxes(matches=None)
The backtesting results show that the model is able to capture the pattern of the time series pm2.5 and O3, but not that of the series CO and NO. This may be because the former have a greater autoregressive behavior, while the future values of the latter do not seem to depend on past values.
It is observed that the backtesting error for the series O3 is higher than that obtained when only that series was modeled. This may be because the model is trying to model multiple time series at once, making the problem more complex.
Recurrent neural networks allow solving a wide variety of forecasting problems.
In the 1:1 and N:1 cases, the model is able to learn patterns of the time series and predict future values with a relative error with respect to the mean close to 6%.
In the N:M case, the model predicts some of the time series with a higher error than in the previous cases. This may be because some time series are more difficult to predict than others, or simply that the model is not good enough for the problem being tried to solve.
Deep learning models have high computational requirements.
To achieve a good deep learning model, it is necessary to find the right architecture, which requires knowledge and experience.
The more series are modeled, the easier it is for the model to learn the relationships between the series but it can lose precision in the individual prediction of each of them.
The use of skforecast allows to simplify the modeling process and speed up the prototyping and development process.
import session_info
session_info.show(html=False)
Dive into Deep Learning. Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J. (2023). Cambridge University Press. https://D2L.ai
© 2024 Codificando Bits https://www.codificandobits.com/blog/redes-neuronales-recurrentes-explicacion-detallada/
Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia.
Time Series Analysis and Forecasting with ADAM Ivan Svetunkov.
Joseph, M. (2022). Modern time series forecasting with Python: Explore industry-ready time series forecasting using modern machine learning and Deep Learning. Packt Publishing.
How to cite this document?
If you use this document or any part of it, we appreciate if you cite it. Thank you very much!
Deep Learning for time series prediction: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) by Fernando Carazo and Joaquín Amat Rodrigo, available under the Attribution-NonCommercial-ShareAlike 4.0 International license at https://www.cienciadedatos.net/documentos/py54-forecasting-con-deep-learning.html
How to cite skforecast?
Zenodo:
Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2023). skforecast (v0.14.0). Zenodo. https://doi.org/10.5281/zenodo.8382788
APA:
Amat Rodrigo, J., & Escobar Ortiz, J. (2023). skforecast (Version 0.14.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788
BibTeX:
@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.14.0}, month = {11}, year = {2024}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }
Did you like the article? Your support is important
Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊
This document created by Fernando Carazo and Joaquín Amat Rodrigo is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.