Deep Learning for time series prediction: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

If you like  Skforecast ,  please give us a star on   GitHub! ⭐️

Deep Learning for time series prediction: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Fernando Carazo Melo, Joaquín Amat Rodrigo
February 2024 (last update August 2024)

Introduction

Deep Learning is a field of artificial intelligence focused on creating models based on neural networks that allow learning non-linear representations. Recurrent neural networks (RNN) are a type of deep learning architecture designed to work with sequential data, where information is propagated through recurrent connections, allowing the network to learn temporal dependencies.

This article describes how to train recurrent neural network models -specifically RNN and LSTM- for time series prediction (forecasting) using Python, Keras and Skforecast.

  • Keras3 provides a friendly interface to build and train neural network models. Thanks to its high-level API, developers can easily implement LSTM architectures, taking advantage of the computational efficiency and scalability offered by deep learning.

  • Skforecast eases the implementation and use of machine learning models -including LSTMs and RNNs- to forecasting problems. Using this package, the user can define the problem and abstract from the architecture. For advanced users, skforecast also allows to execute a previously defined deep learning architecture.

✎ Nota

To fully understand this article, some knowledge about neural networks and deep learning is presupposed. However, if this is not the case, and while we work on creating new material, we provide you with some reference links to start:

Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNN) are a type of neural networks designed to process data that follows a sequential order. In conventional neural networks, such as feedforward networks, information flows in one direction, from input to output through hidden layers, without considering the sequential structure of the data. In contrast, RNNs maintain internal states or memories, which allow them to remember past information and use it to predict future data in the sequence.

The basic unit of an RNN is the recurrent cell. This cell takes two inputs: the current input and the previous hidden state. The hidden state can be understood as a "memory" that retains information from previous iterations. The current input and the previous hidden state are combined to calculate the current output and the new hidden state. This output is used as input for the next iteration, along with the next input in the data sequence.

Despite the advances that have been achieved with RNN architectures, they have limitations to capture long-term patterns. This is why variants such as LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Units) have been developed, which address these problems and allow long-term information to be retained more effectively.

RNN diagram. Fuente: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (1st ed.) [PDF]. Springer.

Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) neural networks are a specialized type of RNNs designed to overcome the limitations associated with capturing long-term temporal dependencies. Unlike traditional RNNs, LSTMs incorporate a more complex architecture, introducing memory units and gate mechanisms to improve information management over time.

Structure of LSTMs

LSTMs have a modular structure consisting of three fundamental gates: the forget gate, the input gate, and the output gate. These gates work together to regulate the flow of information through the memory unit, allowing for more precise control over what information to retain and what to forget.

  • Forget Gate: Regulates how much information should be forgotten and how much should be retained, combining the current input and the previous output through a sigmoid function.

  • Input Gate: Decides how much new information should be added to long-term memory.

  • Output Gate: Determines how much information from the current memory will be used for the final output, combining the current input and memory information through a sigmoid function.

Diagram of the inputs and outputs of an LSTM. Fuente: codificandobits https://databasecamp.de/wp-content/uploads/lstm-architecture-1024x709.png.

Tipos de problemas en el modelado de series temporales

The complexity of a time series problem is usually defined by three key factors: first, deciding which time series or series to use to train the model; second, determining what or how many time series are to be predicted; and third, defining the number of steps into the future that you want to predict. These three aspects can be a real challenge when addressing time series problems.

Recurrent neural networks, thanks to their wide variety of architectures, allow modeling the following scenarios:

  • Problems 1:1 - Model a single series and predict that same series (single-series, single-output)

    • Description: This type of problem involves modeling a time series using only its past. It is a typical autoregressive problem.
    • Example: Predicting daily temperature based on the temperature of the last few days.

  • Problems N:1 - Model a single series using multiple series (multi-series, single-output)

    • Description: These are problems in which multiple time series are used to predict a single series. Each series can represent a different entity or variable, but the output variable is only one of the series.
    • Example: Predicting daily temperature based on multiple series such as: temperature, humidity, and atmospheric pressure.

  • Problems N:M - Model multiple series using multiple series (multi-series, multiple-outputs)

    • Description: These problems consist of modeling and predicting future values of several time series at the same time.
    • Example: Forecasting stock values for several stocks based on historical stock data, energy prices, and commodities prices.

In all these scenarios, the prediction can be made single-step forecasting (one step into the future) or multi-step forecasting (multiple steps into the future). In the first case, the model only predicts a single value, while in the second, the model predicts multiple values into the future.

In some situations, it can be difficult to define and create the appropriate Deep Learning architecture to address a specific problem. The skforecast library provides functionalities that allow determining the appropriate Tensorflow architecture for each problem, simplifying and accelerating the modeling process for a wide variety of problems. Below is an example of how to use skforecast to solve each of the described time series problems using recurrent neural networks.

Data

The data used in this article contains detailed information on air quality in the city of Valencia (Spain). The data collection spans from January 1, 2019 to December 31, 2021, providing hourly measurements of various air pollutants, such as PM2.5 and PM10 particles, carbon monoxide (CO), nitrogen dioxide (NO2), among others. The data has been obtained from the Red de Vigilancia y Control de la Contaminación Atmosférica, 46250054-València - Centre, https://mediambient.gva.es/es/web/calidad-ambiental/datos-historicos platform.

Libraries

💡 Tip: Configuring your backend

As of Skforecast version 0.13.0, PyTorch backend support is available. You can configure the backend by exporting the KERAS_BACKEND environment variable or by editing your local configuration file at ~/.keras/keras.json. The available backend options are: "tensorflow" and "torch". Example: ```python import os os.environ["KERAS_BACKEND"] = "torch" import keras ```

Warning

The backend must be configured before importing Keras, and the backend cannot be changed after the package has been imported.
In [1]:
# Data processing
# ==============================================================================
import os
import pandas as pd
import numpy as np
from skforecast.datasets import fetch_dataset

# Plotting
# ==============================================================================
import matplotlib.pyplot as plt
from skforecast.plot import set_dark_theme
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)

# Keras
# ==============================================================================
os.environ["KERAS_BACKEND"] = "tensorflow" # 'tensorflow', 'jax´ or 'torch'
import keras
from keras.optimizers import Adam
from keras.losses import MeanSquaredError
from keras.callbacks import EarlyStopping

if keras.__version__ > "3.0":
    if keras.backend.backend() == "tensorflow":
        import tensorflow
    elif keras.backend.backend() == "torch":
        import torch
    else:
        print("Backend not recognized. Please use 'tensorflow' or 'torch'.")

# Time series modeling
# ==============================================================================
import skforecast
from skforecast.ForecasterRnn import ForecasterRnn
from skforecast.ForecasterRnn.utils import create_and_compile_model
from sklearn.preprocessing import MinMaxScaler
from skforecast.model_selection_multiseries import backtesting_forecaster_multiseries

# Warning configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

color = '\033[1m\033[38;5;208m' 
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version Keras: {keras.__version__}")
print(f"{color}Using backend: {keras.backend.backend()}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")
if keras.__version__ > "3.0":
    if keras.backend.backend() == "tensorflow":
        print(f"{color}Version tensorflow: {tensorflow.__version__}")
    elif keras.backend.backend() == "torch":
        print(f"{color}Version torch: {torch.__version__}")
    else:
        print(f"{color}Version torch: {jax.__version__}")
Version skforecast: 0.13.0
Version Keras: 3.4.1
Using backend: tensorflow
Version pandas: 2.2.2
Version numpy: 1.26.4
Version tensorflow: 2.17.0

Warning

At the time of writing this document, tensorflow is only compatible with numpy versions lower than 2.0. If you have a higher version, you can downgrade it by running the following command: pip install numpy==1.26.4
In [2]:
# Downloading the dataset and processing it
# ==============================================================================
air_quality = fetch_dataset(name="air_quality_valencia")
air_quality_valencia
--------------------
Hourly measures of several air chemical pollutant (pm2.5, co, no, no2, pm10,
nox, o3, so2) at Valencia city.
 Red de Vigilancia y Control de la Contaminación Atmosférica, 46250054-València
- Centre, https://mediambient.gva.es/es/web/calidad-ambiental/datos-historicos.
Shape of the dataset: (26304, 10)
In [3]:
# Missing values imputation
# ==============================================================================
air_quality = air_quality.interpolate(method="linear")
air_quality = air_quality.sort_index()
air_quality.head()
Out[3]:
pm2.5 co no no2 pm10 nox o3 veloc. direc. so2
datetime
2019-01-01 00:00:00 19.0 0.2 3.0 36.0 22.0 40.0 16.0 0.5 262.0 8.0
2019-01-01 01:00:00 26.0 0.1 2.0 40.0 32.0 44.0 6.0 0.6 248.0 8.0
2019-01-01 02:00:00 31.0 0.1 11.0 42.0 36.0 58.0 3.0 0.3 224.0 8.0
2019-01-01 03:00:00 30.0 0.1 15.0 41.0 35.0 63.0 3.0 0.2 220.0 10.0
2019-01-01 04:00:00 30.0 0.1 16.0 39.0 36.0 63.0 3.0 0.4 221.0 11.0

It is verified that the data set has an index of type DatetimeIndex with hourly frequency. Although it is not necessary for the data to have this type of index to use skforecast, it is more advantageous for the subsequent use of the predictions.

In [4]:
# Checking the frequency of the time series
# ==============================================================================
print(f"Index: {air_quality.index.dtype}")
print(f"Frequency: {air_quality.index.freq}")
Index: datetime64[ns]
Frequency: <Hour>

To facilitate the training of the models, the search for optimal hyperparameters, and the evaluation of their predictive capacity, the data is divided into three separate sets: training, validation, and test.

In [5]:
# Split train-validation-test
# ==============================================================================
end_train = "2021-03-31 23:59:00"
end_validation = "2021-09-30 23:59:00"
air_quality_train = air_quality.loc[:end_train, :].copy()
air_quality_val = air_quality.loc[end_train:end_validation, :].copy()
air_quality_test = air_quality.loc[end_validation:, :].copy()

print(
    f"Dates train      : {air_quality_train.index.min()} --- " 
    f"{air_quality_train.index.max()}  (n={len(air_quality_train)})"
)
print(
    f"Dates validation : {air_quality_val.index.min()} --- " 
    f"{air_quality_val.index.max()}  (n={len(air_quality_val)})"
)
print(
    f"Dates test       : {air_quality_test.index.min()} --- " 
    f"{air_quality_test.index.max()}  (n={len(air_quality_test)})"
)
Dates train      : 2019-01-01 00:00:00 --- 2021-03-31 23:00:00  (n=19704)
Dates validation : 2021-04-01 00:00:00 --- 2021-09-30 23:00:00  (n=4392)
Dates test       : 2021-10-01 00:00:00 --- 2021-12-31 23:00:00  (n=2208)
In [6]:
# Plotting pm2.5
# ==============================================================================
set_dark_theme()
fig, ax = plt.subplots(figsize=(7, 3))
air_quality_train["pm2.5"].rolling(100).mean().plot(ax=ax, label="train")
air_quality_val["pm2.5"].rolling(100).mean().plot(ax=ax, label="validation")
air_quality_test["pm2.5"].rolling(100).mean().plot(ax=ax, label="test")
ax.set_title("pm2.5")
ax.legend();

LSTM model and ForecasterRnn

Although tensorflow-keras facilitates the process of creating deep learning architectures, it is not always trivial to determine the dimensions that an LSTM model should have for forecasting, as these depend on how many time series are being modeled, how many are being predicted, and the length of the prediction horizon.

To improve the user experience and speed up the prototyping, development, and production process, skforecast has the create_and_compile_model function, with which, by indicating just a few arguments, the architecture is inferred and the model is created.

  • series: Time series to be used to train the model

  • levels: Time series to be predicted.

  • lags: Number of time steps to be used to predict the next value.
  • steps: Number of time steps to be predicted.
  • recurrent_layer: Type of recurrent layer to use. By default, an LSTM layer is used.
  • recurrent_units: Number of units in the recurrent layer. By default, 100 is used. If a list is passed, a recurrent layer will be created for each element in the list.
  • dense_units: Number of units in the dense layer. By default, 64 is used. If a list is passed, a dense layer will be created for each element in the list.
  • optimizer: Optimizer to use. By default, Adam with a learning rate of 0.01 is used.
  • loss: Loss function to use. By default, Mean Squared Error is used.

✎ Note

The `create_and_compile_model` function is designed to facilitate the creation of the Tensorflow model, however, more advanced users can create their own architectures as long as the input and output dimensions match the use case to which the model will be applied.

Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).

1:1 Problem - Model a single series and predict that same series

In this first scenario, we want to predict the concentration of $O_3$ for the next 1 and 5 days using only its own historical data. It is therefore a scenario in which a single time series is modeled using only its past values. This problem is also called autoregressive prediction.

One day ahead prediction (Single step forecasting)

First, a single-step forecast is made. To do this, a model is created using the create_and_compile_model function, which is passed as an argument to the ForecasterRnn class.

This is the simplest example of forecasting with recurrent neural networks. The model only needs a time series to train and predict. Therefore, the series argument of the create_and_compile_model function only needs a time series, the same one that is to be predicted (levels). In addition, since only a single value is to be predicted in the future, the steps argument is equal to 1.

In [7]:
# Create model
# ==============================================================================
series = ["o3"] # Series used as predictors
levels = ["o3"] # Target serie to predict
lags = 32 # Past time steps to be used to predict the target
steps = 1 # Future time steps to be predicted

data = air_quality[series].copy()
data_train = air_quality_train[series].copy()
data_val = air_quality_val[series].copy()
data_test = air_quality_test[series].copy()

model = create_and_compile_model(
    series=data_train,
    levels=levels, 
    lags=lags,
    steps=steps,
    recurrent_layer="LSTM",
    recurrent_units=4,
    dense_units=16,
    optimizer=Adam(learning_rate=0.01), 
    loss=MeanSquaredError()
)
model.summary()
keras version: 3.4.1
Using backend: tensorflow
tensorflow version: 2.17.0
Model: "functional"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer (InputLayer)        │ (None, 32, 1)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm (LSTM)                     │ (None, 4)              │            96 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense (Dense)                   │ (None, 16)             │            80 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_1 (Dense)                 │ (None, 1)              │            17 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ reshape (Reshape)               │ (None, 1, 1)           │             0 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 193 (772.00 B)
 Trainable params: 193 (772.00 B)
 Non-trainable params: 0 (0.00 B)

In this case, a simple LSTM network is used, with a single recurrent layer with 4 neurons and a hidden dense layer with 16 neurons. The following table shows a detailed description of each layer:

Layer Type Output Shape Parameters Description
Input Layer (InputLayer) InputLayer (None, 32, 1) 0 This is the input layer of the model. It receives sequences of length 32, corresponding to the number of lags with a dimension at each time step.
LSTM Layer (Long Short-Term Memory) LSTM (None, 4) 96 The LSTM layer is a long and short-term memory layer that processes the input sequence. It has 4 LSTM units and connects to the next layer.
First Dense Layer (Dense) Dense (None, 16) 80 This is a fully connected layer with 16 units and uses a default activation function (relu) in the provided architecture.
Second Dense Layer (Dense) Dense (None, 1) 17 Another fully connected dense layer, this time with a single output unit. It also uses a default activation function.
Reshape Layer (Reshape) Reshape (None, 1, 1) 0 This layer reshapes the output of the previous dense layer to have a specific shape (None, 1, 1). This layer is not strictly necessary, but is included to make the module generalizable to other multi-output forecasting problems. The dimension of this output layer is (None, steps_to_predict_future, series_to_predict). In this case, steps=1 and levels="o3", so the dimension is (None, 1, 1)
Total Parameters and Trainable - - 193 Total Parameters: 193, Trainable Parameters: 193, Non-Trainable Parameters: 0

Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).

The forecaster is created from the model and the validation data is passed to it so that it can evaluate the model at each epoch. In addition, a MinMaxScaler object is passed to it to standardize the input and output data. This object will be responsible for scaling the input data and the predictions to their original scale.

The fit_kwargs are the arguments passed to the fit method of the model. In this case, the number of epochs, the batch size, the validation data, and a callback to stop training when the validation loss stops decreasing are passed to it.

In [8]:
# Forecaster Creation
# ==============================================================================
forecaster = ForecasterRnn(
    regressor=model,
    levels=levels,
    transformer_series=MinMaxScaler(),
    fit_kwargs={
        "epochs": 10,  # Number of epochs to train the model.
        "batch_size": 32,  # Batch size to train the model.
        "callbacks": [
            EarlyStopping(monitor="val_loss", patience=5)
        ],  # Callback to stop training when it is no longer learning.
        "series_val": data_val,  # Validation data for model training.
    },
)    

forecaster
/home/ubuntu/anaconda3/envs/skforecast_13_py12/lib/python3.12/site-packages/skforecast/ForecasterRnn/ForecasterRnn.py:229: UserWarning:

Setting `lags` = 'auto'. `lags` are inferred from the regressor architecture. Avoid the warning with lags=lags.

/home/ubuntu/anaconda3/envs/skforecast_13_py12/lib/python3.12/site-packages/skforecast/ForecasterRnn/ForecasterRnn.py:265: UserWarning:

`steps` default value = 'auto'. `steps` inferred from regressor architecture. Avoid the warning with steps=steps.

Out[8]:
============= 
ForecasterRnn 
============= 
Regressor: <Functional name=functional, built=True> 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32] 
Transformer for series: MinMaxScaler() 
Window size: 32 
Target series, levels: ['o3'] 
Multivariate series (names): None 
Maximum steps predicted: [1] 
Training range: None 
Training index type: None 
Training index frequency: None 
Model parameters: {'name': 'functional', 'trainable': True, 'layers': [{'module': 'keras.layers', 'class_name': 'InputLayer', 'config': {'batch_shape': (None, 32, 1), 'dtype': 'float32', 'sparse': False, 'name': 'input_layer'}, 'registered_name': None, 'name': 'input_layer', 'inbound_nodes': []}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'return_sequences': False, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'zero_output_for_mask': False, 'units': 4, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'OrthogonalInitializer', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'seed': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 1)}, 'name': 'lstm', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32, 1), 'dtype': 'float32', 'keras_history': ['input_layer', 0, 0]}},), 'kwargs': {'training': False, 'mask': None}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 16, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 4)}, 'name': 'dense', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 4), 'dtype': 'float32', 'keras_history': ['lstm', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_1', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 1, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 16)}, 'name': 'dense_1', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 16), 'dtype': 'float32', 'keras_history': ['dense', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Reshape', 'config': {'name': 'reshape', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'target_shape': (1, 1)}, 'registered_name': None, 'build_config': {'input_shape': (None, 1)}, 'name': 'reshape', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 1), 'dtype': 'float32', 'keras_history': ['dense_1', 0, 0]}},), 'kwargs': {}}]}], 'input_layers': [['input_layer', 0, 0]], 'output_layers': [['reshape', 0, 0]]} 
Compile parameters: {'optimizer': {'module': 'keras.optimizers', 'class_name': 'Adam', 'config': {'name': 'adam', 'learning_rate': 0.009999999776482582, 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'loss_scale_factor': None, 'gradient_accumulation_steps': None, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}, 'registered_name': None}, 'loss': {'module': 'keras.losses', 'class_name': 'MeanSquaredError', 'config': {'name': 'mean_squared_error', 'reduction': 'sum_over_batch_size'}, 'registered_name': None}, 'loss_weights': None, 'metrics': None, 'weighted_metrics': None, 'run_eagerly': False, 'steps_per_execution': 1, 'jit_compile': False} 
fit_kwargs: {'epochs': 10, 'batch_size': 32, 'callbacks': [<keras.src.callbacks.early_stopping.EarlyStopping object at 0x7f525c7f7170>]} 
Creation date: 2024-08-08 09:00:27 
Last fit date: None 
Skforecast version: 0.13.0 
Python version: 3.12.4 
Forecaster id: None 

Warning

The warning indicates that the number of lags has been inferred from the model architecture. In this case, the model has an LSTM layer with 32 neurons, so the number of lags is 32. If a different number of lags is desired, the `lags` argument can be specified in the `create_and_compile_model` function. To omit the warning, the `lags=lags` and `steps=steps` arguments can be specified in the initialization of the `ForecasterRnn`.
In [9]:
# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
Epoch 1/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 9s 11ms/step - loss: 0.0213 - val_loss: 0.0057
Epoch 2/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 7s 11ms/step - loss: 0.0055 - val_loss: 0.0057
Epoch 3/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 7s 11ms/step - loss: 0.0056 - val_loss: 0.0058
Epoch 4/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 7s 12ms/step - loss: 0.0055 - val_loss: 0.0063
Epoch 5/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 7s 11ms/step - loss: 0.0052 - val_loss: 0.0055
Epoch 6/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 7s 11ms/step - loss: 0.0055 - val_loss: 0.0058
Epoch 7/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 7s 11ms/step - loss: 0.0053 - val_loss: 0.0056
Epoch 8/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 7s 11ms/step - loss: 0.0051 - val_loss: 0.0061
Epoch 9/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 11ms/step - loss: 0.0054 - val_loss: 0.0055
Epoch 10/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 7s 11ms/step - loss: 0.0050 - val_loss: 0.0056
In [10]:
# Track training and overfitting
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)

In deep learning models, it is very important to control overfitting. To do this, a Keras callback is used to stop training when the value of the cost function, on the validation data, stops decreasing. In this case, the callback does not stop training, as we have only trained for 10 epochs. If the number of epochs is increased, the callback will stop training when the validation loss stops decreasing.

On the other hand, another very useful tool is the plotting of the training and validation loss at each epoch. This allows you to visualize the behavior of the model and detect possible overfitting problems.

In the case of our model, it is observed that the training loss decreases rapidly in the first epoch, while the validation loss is low from the first epoch. From this it can be deduced:

  • The model is not overfitting, as the validation loss is similar to the training loss.

  • The validation error is calculated once the model is trained, so the first value of the validation loss in the first epoch is similar to the training loss in the second epoch.

Graphical explanation of overfitting. Source: https://datahacker.rs/018-pytorch-popular-techniques-to-prevent-the-overfitting-in-a-neural-networks/.

Once the forecaster has been trained, predictions can be obtained. In this case, it is a single value since only one step into the future (step) has been specified.

In [11]:
# Predictions
# ==============================================================================
predictions = forecaster.predict()
predictions
Out[11]:
o3
2021-04-01 44.835587

To obtain a robust estimate of the predictive capacity of the model, a backtesting process is performed. The backtesting process consists of generating a prediction for each observation in the test set, following the same procedure that would be followed if the model were in production, and finally comparing the predicted value with the actual value.

In [12]:
# Backtesting with test data
# ==============================================================================
metrics, predictions = backtesting_forecaster_multiseries(
    forecaster=forecaster,
    steps=forecaster.max_step,
    series=data,
    levels=forecaster.levels,
    initial_train_size=len(data.loc[:end_validation, :]), # Training + Validation Data
    metric="mean_absolute_error",
    verbose=False, # Set to True for detailed information
    refit=False,
)
Epoch 1/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 7s 9ms/step - loss: 0.0053 - val_loss: 0.0059
Epoch 2/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 6s 8ms/step - loss: 0.0051 - val_loss: 0.0056
Epoch 3/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 6s 8ms/step - loss: 0.0053 - val_loss: 0.0058
Epoch 4/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 7s 9ms/step - loss: 0.0052 - val_loss: 0.0059
Epoch 5/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 6s 8ms/step - loss: 0.0054 - val_loss: 0.0053
Epoch 6/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 6s 8ms/step - loss: 0.0052 - val_loss: 0.0054
Epoch 7/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 6s 8ms/step - loss: 0.0052 - val_loss: 0.0058
Epoch 8/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 6s 8ms/step - loss: 0.0051 - val_loss: 0.0054
Epoch 9/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 6s 8ms/step - loss: 0.0053 - val_loss: 0.0057
Epoch 10/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 6s 8ms/step - loss: 0.0051 - val_loss: 0.0053
In [13]:
# Backtesting predictions
# ==============================================================================
predictions
Out[13]:
o3
2021-10-01 00:00:00 51.900013
2021-10-01 01:00:00 57.020832
2021-10-01 02:00:00 61.519764
2021-10-01 03:00:00 62.130241
2021-10-01 04:00:00 52.330482
... ...
2021-12-31 19:00:00 17.955385
2021-12-31 20:00:00 19.861944
2021-12-31 21:00:00 20.923643
2021-12-31 22:00:00 20.457081
2021-12-31 23:00:00 21.907673

2208 rows × 1 columns

In [14]:
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
    title="Prediction vs real values in the test set",
    xaxis_title="Date time",
    yaxis_title="O3",
    width=750,
    height=350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.05,
        xanchor="left",
        x=0
    )
)
fig.show()
In [15]:
# Backtesting metrics
# ==============================================================================
metrics
Out[15]:
levels mean_absolute_error
0 o3 6.044899
In [16]:
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
Serie mean: 54.52
Relative error (mae): 11.09 %

Multi-step forecasting

The next case is to predict the next 5 values of O3 using only its historical data. It is therefore a scenario in which multiple future steps of a single time series are modeled using only its past values.

A similar architecture to the previous one will be used, but with a greater number of neurons in the LSTM layer and in the first dense layer. This will allow the model to have greater flexibility to model the time series.

In [17]:
# Model creation
# ==============================================================================
series = ["o3"] # Series used as predictors
levels = ["o3"] # Target serie to predict
lags = 32 # Past time steps to be used to predict the target
steps = 5 # Future time steps to be predicted

model = create_and_compile_model(
    series=data_train,
    levels=levels, 
    lags=lags,
    steps=steps,
    recurrent_layer="LSTM",
    recurrent_units=50,
    dense_units=32,
    optimizer=Adam(learning_rate=0.01), 
    loss=MeanSquaredError()
)
model.summary()
keras version: 3.4.1
Using backend: tensorflow
tensorflow version: 2.17.0
Model: "functional_1"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_1 (InputLayer)      │ (None, 32, 1)          │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_1 (LSTM)                   │ (None, 50)             │        10,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_2 (Dense)                 │ (None, 32)             │         1,632 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_3 (Dense)                 │ (None, 5)              │           165 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ reshape_1 (Reshape)             │ (None, 5, 1)           │             0 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 12,197 (47.64 KB)
 Trainable params: 12,197 (47.64 KB)
 Non-trainable params: 0 (0.00 B)
In [18]:
# Forecaster Creation
# ==============================================================================
forecaster = ForecasterRnn(
    regressor=model,
    levels=levels,
    transformer_series=MinMaxScaler(),
    fit_kwargs={
        "epochs": 10,  # Number of epochs to train the model.
        "batch_size": 32,  # Batch size to train the model.
        "callbacks": [
            EarlyStopping(monitor="val_loss", patience=5)
        ],  # Callback to stop training when it is no longer learning.
        "series_val": data_val,  # Validation data for model training.
    },
)    

forecaster
/home/ubuntu/anaconda3/envs/skforecast_13_py12/lib/python3.12/site-packages/skforecast/ForecasterRnn/ForecasterRnn.py:229: UserWarning:

Setting `lags` = 'auto'. `lags` are inferred from the regressor architecture. Avoid the warning with lags=lags.

/home/ubuntu/anaconda3/envs/skforecast_13_py12/lib/python3.12/site-packages/skforecast/ForecasterRnn/ForecasterRnn.py:265: UserWarning:

`steps` default value = 'auto'. `steps` inferred from regressor architecture. Avoid the warning with steps=steps.

Out[18]:
============= 
ForecasterRnn 
============= 
Regressor: <Functional name=functional_1, built=True> 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32] 
Transformer for series: MinMaxScaler() 
Window size: 32 
Target series, levels: ['o3'] 
Multivariate series (names): None 
Maximum steps predicted: [1 2 3 4 5] 
Training range: None 
Training index type: None 
Training index frequency: None 
Model parameters: {'name': 'functional_1', 'trainable': True, 'layers': [{'module': 'keras.layers', 'class_name': 'InputLayer', 'config': {'batch_shape': (None, 32, 1), 'dtype': 'float32', 'sparse': False, 'name': 'input_layer_1'}, 'registered_name': None, 'name': 'input_layer_1', 'inbound_nodes': []}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm_1', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'return_sequences': False, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'zero_output_for_mask': False, 'units': 50, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'OrthogonalInitializer', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'seed': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 1)}, 'name': 'lstm_1', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32, 1), 'dtype': 'float32', 'keras_history': ['input_layer_1', 0, 0]}},), 'kwargs': {'training': False, 'mask': None}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_2', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 32, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 50)}, 'name': 'dense_2', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 50), 'dtype': 'float32', 'keras_history': ['lstm_1', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_3', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 5, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32)}, 'name': 'dense_3', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32), 'dtype': 'float32', 'keras_history': ['dense_2', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Reshape', 'config': {'name': 'reshape_1', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'target_shape': (5, 1)}, 'registered_name': None, 'build_config': {'input_shape': (None, 5)}, 'name': 'reshape_1', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 5), 'dtype': 'float32', 'keras_history': ['dense_3', 0, 0]}},), 'kwargs': {}}]}], 'input_layers': [['input_layer_1', 0, 0]], 'output_layers': [['reshape_1', 0, 0]]} 
Compile parameters: {'optimizer': {'module': 'keras.optimizers', 'class_name': 'Adam', 'config': {'name': 'adam', 'learning_rate': 0.009999999776482582, 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'loss_scale_factor': None, 'gradient_accumulation_steps': None, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}, 'registered_name': None}, 'loss': {'module': 'keras.losses', 'class_name': 'MeanSquaredError', 'config': {'name': 'mean_squared_error', 'reduction': 'sum_over_batch_size'}, 'registered_name': None}, 'loss_weights': None, 'metrics': None, 'weighted_metrics': None, 'run_eagerly': False, 'steps_per_execution': 1, 'jit_compile': False} 
fit_kwargs: {'epochs': 10, 'batch_size': 32, 'callbacks': [<keras.src.callbacks.early_stopping.EarlyStopping object at 0x7f52122384d0>]} 
Creation date: 2024-08-08 09:05:49 
Last fit date: None 
Skforecast version: 0.13.0 
Python version: 3.12.4 
Forecaster id: None 

✎ Note

The `fit_kwargs` parameter is very useful as it allows you to set any configuration in the model, in this case Keras. In the previous code, the number of training epochs (10) is defined with a batch size of 32. An `EarlyStopping` callback is configured to stop training when the validation loss stops decreasing for 5 epochs (`patience=5`). Other callbacks can also be configured, such as `ModelCheckpoint` to save the model at each epoch, or even Tensorboard to visualize the training and validation loss in real time.
In [19]:
# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
Epoch 1/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 13s 17ms/step - loss: 0.0284 - val_loss: 0.0155
Epoch 2/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - loss: 0.0134 - val_loss: 0.0123
Epoch 3/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - loss: 0.0124 - val_loss: 0.0116
Epoch 4/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - loss: 0.0120 - val_loss: 0.0120
Epoch 5/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - loss: 0.0116 - val_loss: 0.0116
Epoch 6/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 15ms/step - loss: 0.0118 - val_loss: 0.0119
Epoch 7/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - loss: 0.0114 - val_loss: 0.0136
Epoch 8/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - loss: 0.0114 - val_loss: 0.0117
Epoch 9/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 17ms/step - loss: 0.0112 - val_loss: 0.0142
Epoch 10/10
615/615 ━━━━━━━━━━━━━━━━━━━━ 10s 16ms/step - loss: 0.0112 - val_loss: 0.0113
In [20]:
# Train and overfitting tracking
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)

It is anticipated that the prediction will be of lower quality than in the previous case, as the error observed in the different epochs is higher. This has a simple explanation, and that is that the model has to predict 5 values instead of 1. Therefore, the validation error is higher since the loss of 5 values is being calculated instead of 1.

The prediction is made. In this case, there are 5 values since 5 steps into the future (step) have been specified.

In [21]:
# Prediction
# ==============================================================================
predictions = forecaster.predict()
predictions
Out[21]:
o3
2021-04-01 00:00:00 43.816605
2021-04-01 01:00:00 43.599815
2021-04-01 02:00:00 43.366970
2021-04-01 03:00:00 40.520279
2021-04-01 04:00:00 34.710251

Specific steps can be predicted, as long as they are within the prediction horizon defined in the model.

In [22]:
# Specific step predictions
# ==============================================================================
predictions = forecaster.predict(steps=[1, 3])
predictions
Out[22]:
o3
2021-04-01 00:00:00 43.816605
2021-04-01 02:00:00 43.366970
In [23]:
# Backtesting 
# ==============================================================================
metrics, predictions = backtesting_forecaster_multiseries(
    forecaster=forecaster,
    steps=forecaster.max_step,
    series=data,
    levels=forecaster.levels,
    initial_train_size=len(data.loc[:end_validation, :]),
    metric="mean_absolute_error",
    verbose=False,
    refit=False,
)
Epoch 1/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 13s 16ms/step - loss: 0.0111 - val_loss: 0.0116
Epoch 2/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 16ms/step - loss: 0.0110 - val_loss: 0.0109
Epoch 3/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 15ms/step - loss: 0.0109 - val_loss: 0.0123
Epoch 4/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 15ms/step - loss: 0.0106 - val_loss: 0.0116
Epoch 5/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 15ms/step - loss: 0.0109 - val_loss: 0.0108
Epoch 6/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 16ms/step - loss: 0.0108 - val_loss: 0.0107
Epoch 7/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 15ms/step - loss: 0.0107 - val_loss: 0.0109
Epoch 8/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 16ms/step - loss: 0.0107 - val_loss: 0.0113
Epoch 9/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 15ms/step - loss: 0.0106 - val_loss: 0.0107
Epoch 10/10
752/752 ━━━━━━━━━━━━━━━━━━━━ 12s 15ms/step - loss: 0.0107 - val_loss: 0.0107
In [24]:
# Backtesting predictions
# ==============================================================================
predictions
Out[24]:
o3
2021-10-01 00:00:00 59.139019
2021-10-01 01:00:00 56.933773
2021-10-01 02:00:00 53.620872
2021-10-01 03:00:00 52.244370
2021-10-01 04:00:00 47.954681
... ...
2021-12-31 19:00:00 24.404228
2021-12-31 20:00:00 23.051666
2021-12-31 21:00:00 13.971795
2021-12-31 22:00:00 17.281343
2021-12-31 23:00:00 19.137854

2208 rows × 1 columns

In [25]:
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
    title="Prediction vs real values in the test set",
    xaxis_title="Date time",
    yaxis_title="O3",
    width=750,
    height=350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.05,
        xanchor="left",
        x=0
    )
)
fig.show()
In [26]:
# Backtesting metrics
# ==============================================================================
metrics
Out[26]:
levels mean_absolute_error
0 o3 9.234384
In [27]:
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
Serie mean: 54.52
Relative error (mae): 16.94 %

In this case, the prediction is worse than in the previous case. This is to be expected since the model has to predict 5 values instead of 1.

N:1 Problems - Multiple time series with single output

In this case, the same series will be predicted, but using multiple time series as predictors. It is therefore a scenario in which past values of multiple time series are used to predict a single time series.

These types of approaches are very useful when multiple time series related to each other are available. For example, in the case of temperature prediction, multiple time series such as humidity, atmospheric pressure, wind speed, etc. can be used.

In this type of problem, the architecture of the neural network is more complex, an additional recurrent dense layer is needed to process the multiple input series. In addition, another hidden dense layer is added to process the output of the recurrent layer. As can be seen, creating the model using skforecast is very simple, simply pass a list of integers to the recurrent_units and dense_units arguments to create multiple recurrent and dense layers.

In [28]:
# Creación del modelo
# ==============================================================================
# Time series used in the training. Now, it is multiseries
series = ['pm2.5', 'co', 'no', 'no2', 'pm10', 'nox', 'o3', 'veloc.', 'direc.','so2'] 
levels = ["o3"] 
lags = 32 
steps = 5 

data = air_quality[series].copy()
data_train = air_quality_train[series].copy()
data_val = air_quality_val[series].copy()
data_test = air_quality_test[series].copy()

model = create_and_compile_model(
    series=data_train,
    levels=levels, 
    lags=lags,
    steps=steps,
    recurrent_layer="LSTM",
    recurrent_units=[100, 50],
    dense_units=[64, 32],
    optimizer=Adam(learning_rate=0.01), 
    loss=MeanSquaredError()
)
model.summary()
keras version: 3.4.1
Using backend: tensorflow
tensorflow version: 2.17.0
Model: "functional_2"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_2 (InputLayer)      │ (None, 32, 10)         │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_2 (LSTM)                   │ (None, 32, 100)        │        44,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_3 (LSTM)                   │ (None, 50)             │        30,200 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_4 (Dense)                 │ (None, 64)             │         3,264 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_5 (Dense)                 │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_6 (Dense)                 │ (None, 5)              │           165 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ reshape_2 (Reshape)             │ (None, 5, 1)           │             0 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 80,109 (312.93 KB)
 Trainable params: 80,109 (312.93 KB)
 Non-trainable params: 0 (0.00 B)

Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).

In [29]:
# Forecaster creation
# ==============================================================================
forecaster = ForecasterRnn(
    regressor=model,
    levels=levels,
    steps=steps,
    lags=lags,
    transformer_series=MinMaxScaler(),
    fit_kwargs={
        "epochs": 4,  
        "batch_size": 128,  
        "series_val": data_val,
    },
)
forecaster
Out[29]:
============= 
ForecasterRnn 
============= 
Regressor: <Functional name=functional_2, built=True> 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32] 
Transformer for series: MinMaxScaler() 
Window size: 32 
Target series, levels: ['o3'] 
Multivariate series (names): None 
Maximum steps predicted: [1 2 3 4 5] 
Training range: None 
Training index type: None 
Training index frequency: None 
Model parameters: {'name': 'functional_2', 'trainable': True, 'layers': [{'module': 'keras.layers', 'class_name': 'InputLayer', 'config': {'batch_shape': (None, 32, 10), 'dtype': 'float32', 'sparse': False, 'name': 'input_layer_2'}, 'registered_name': None, 'name': 'input_layer_2', 'inbound_nodes': []}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm_2', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'return_sequences': True, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'zero_output_for_mask': False, 'units': 100, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'OrthogonalInitializer', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'seed': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 10)}, 'name': 'lstm_2', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32, 10), 'dtype': 'float32', 'keras_history': ['input_layer_2', 0, 0]}},), 'kwargs': {'training': False, 'mask': None}}]}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm_3', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'return_sequences': False, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'zero_output_for_mask': False, 'units': 50, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'OrthogonalInitializer', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'seed': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 100)}, 'name': 'lstm_3', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32, 100), 'dtype': 'float32', 'keras_history': ['lstm_2', 0, 0]}},), 'kwargs': {'training': False, 'mask': None}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_4', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 64, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 50)}, 'name': 'dense_4', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 50), 'dtype': 'float32', 'keras_history': ['lstm_3', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_5', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 32, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 64)}, 'name': 'dense_5', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 64), 'dtype': 'float32', 'keras_history': ['dense_4', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_6', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 5, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32)}, 'name': 'dense_6', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32), 'dtype': 'float32', 'keras_history': ['dense_5', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Reshape', 'config': {'name': 'reshape_2', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'target_shape': (5, 1)}, 'registered_name': None, 'build_config': {'input_shape': (None, 5)}, 'name': 'reshape_2', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 5), 'dtype': 'float32', 'keras_history': ['dense_6', 0, 0]}},), 'kwargs': {}}]}], 'input_layers': [['input_layer_2', 0, 0]], 'output_layers': [['reshape_2', 0, 0]]} 
Compile parameters: {'optimizer': {'module': 'keras.optimizers', 'class_name': 'Adam', 'config': {'name': 'adam', 'learning_rate': 0.009999999776482582, 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'loss_scale_factor': None, 'gradient_accumulation_steps': None, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}, 'registered_name': None}, 'loss': {'module': 'keras.losses', 'class_name': 'MeanSquaredError', 'config': {'name': 'mean_squared_error', 'reduction': 'sum_over_batch_size'}, 'registered_name': None}, 'loss_weights': None, 'metrics': None, 'weighted_metrics': None, 'run_eagerly': False, 'steps_per_execution': 1, 'jit_compile': False} 
fit_kwargs: {'epochs': 4, 'batch_size': 128} 
Creation date: 2024-08-08 09:10:05 
Last fit date: None 
Skforecast version: 0.13.0 
Python version: 3.12.4 
Forecaster id: None 
In [30]:
# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
Epoch 1/4
154/154 ━━━━━━━━━━━━━━━━━━━━ 21s 107ms/step - loss: 213.6567 - val_loss: 0.0406
Epoch 2/4
154/154 ━━━━━━━━━━━━━━━━━━━━ 19s 95ms/step - loss: 0.0273 - val_loss: 0.0216
Epoch 3/4
154/154 ━━━━━━━━━━━━━━━━━━━━ 13s 84ms/step - loss: 0.0161 - val_loss: 0.0181
Epoch 4/4
154/154 ━━━━━━━━━━━━━━━━━━━━ 14s 92ms/step - loss: 0.0131 - val_loss: 0.0145
In [31]:
# Trainig and overfitting tracking
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)
In [32]:
# Prediction
# ==============================================================================
predictions = forecaster.predict()
predictions
Out[32]:
o3
2021-04-01 00:00:00 52.117687
2021-04-01 01:00:00 45.810757
2021-04-01 02:00:00 44.487831
2021-04-01 03:00:00 38.737705
2021-04-01 04:00:00 32.419250
In [33]:
# Backtesting with test data
# ==============================================================================
metrics, predictions = backtesting_forecaster_multiseries(
    forecaster=forecaster,
    steps=forecaster.max_step,
    series=data,
    levels=forecaster.levels,
    initial_train_size=len(data.loc[:end_validation, :]), # Datos de entrenamiento + validación
    metric="mean_absolute_error",
    verbose=False,
    refit=False,
)
Epoch 1/4
188/188 ━━━━━━━━━━━━━━━━━━━━ 21s 98ms/step - loss: 0.0118 - val_loss: 0.0141
Epoch 2/4
188/188 ━━━━━━━━━━━━━━━━━━━━ 21s 99ms/step - loss: 0.0113 - val_loss: 0.0123
Epoch 3/4
188/188 ━━━━━━━━━━━━━━━━━━━━ 18s 95ms/step - loss: 0.0109 - val_loss: 0.0115
Epoch 4/4
188/188 ━━━━━━━━━━━━━━━━━━━━ 18s 97ms/step - loss: 0.0108 - val_loss: 0.0122
In [34]:
# Backtesting metrics
# ==============================================================================
metrics
Out[34]:
levels mean_absolute_error
0 o3 10.378015
In [35]:
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
Serie mean: 54.52
Relative error (mae): 19.04 %
In [36]:
# Backtesting predictions
# ==============================================================================
predictions
Out[36]:
o3
2021-10-01 00:00:00 52.209450
2021-10-01 01:00:00 46.392929
2021-10-01 02:00:00 44.477547
2021-10-01 03:00:00 39.754299
2021-10-01 04:00:00 35.939648
... ...
2021-12-31 19:00:00 22.344759
2021-12-31 20:00:00 17.472404
2021-12-31 21:00:00 -0.343583
2021-12-31 22:00:00 0.760350
2021-12-31 23:00:00 1.540229

2208 rows × 1 columns

In [37]:
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
    title="Prediction vs real values in the test set",
    xaxis_title="Date time",
    yaxis_title="O3",
    width=750,
    height=350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.05,
        xanchor="left",
        x=0
    )
)
fig.show()

When using multiple time series as predictors, it would be expected that the model would be able to predict the target series better. However, in this case, the predictions are worse than in the previous case where only a time series was used as a predictor. This may be because the time series used as predictors are not related to the target series. Therefore, the model is not able to learn any relationship between them.

N:M Problems - Multiple time series with multiple outputs

In the next and last scenario, multiple time series are predicted using multiple time series as predictors. It is therefore a scenario in which multiple series are modeled simultaneously using a single model. This has special application in many real scenarios, such as the prediction of stock values for several companies based on the stock history, the price of energy and commodities. Or the case of forecasting multiple products in an online store, based on the sales of other products, the price of the products, etc.

In [38]:
# Model creation
# ==============================================================================
# Now, we have multiple series and multiple targets
series = ['pm2.5', 'co', 'no', 'no2', 'pm10', 'nox', 'o3', 'veloc.', 'direc.', 'so2'] 
levels = ['pm2.5', 'co', 'no', "o3"] #  Features to predict. It can be all the series or less
lags = 32 
steps = 5 

data = air_quality[series].copy()
data_train = air_quality_train[series].copy()
data_val = air_quality_val[series].copy()
data_test = air_quality_test[series].copy()

model = create_and_compile_model(
    series=data_train,
    levels=levels, 
    lags=lags,
    steps=steps,
    recurrent_layer="LSTM",
    recurrent_units=[100, 50],
    dense_units=[64, 32],
    optimizer=Adam(learning_rate=0.01), 
    loss=MeanSquaredError()
)
model.summary()
keras version: 3.4.1
Using backend: tensorflow
tensorflow version: 2.17.0
Model: "functional_3"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Layer (type)                     Output Shape                  Param # ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ input_layer_3 (InputLayer)      │ (None, 32, 10)         │             0 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_4 (LSTM)                   │ (None, 32, 100)        │        44,400 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ lstm_5 (LSTM)                   │ (None, 50)             │        30,200 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_7 (Dense)                 │ (None, 64)             │         3,264 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_8 (Dense)                 │ (None, 32)             │         2,080 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ dense_9 (Dense)                 │ (None, 20)             │           660 │
├─────────────────────────────────┼────────────────────────┼───────────────┤
│ reshape_3 (Reshape)             │ (None, 5, 4)           │             0 │
└─────────────────────────────────┴────────────────────────┴───────────────┘
 Total params: 80,604 (314.86 KB)
 Trainable params: 80,604 (314.86 KB)
 Non-trainable params: 0 (0.00 B)
In [39]:
# Forecaster creation
# ==============================================================================
forecaster = ForecasterRnn(
    regressor=model,
    levels=levels,
    steps=steps,
    lags=lags,
    transformer_series=MinMaxScaler(),
    fit_kwargs={
        "epochs": 100, 
        "batch_size": 128, 
        "callbacks": [
            EarlyStopping(monitor="val_loss", patience=5)
        ],  
        "series_val": data_val,
    },
)
forecaster
Out[39]:
============= 
ForecasterRnn 
============= 
Regressor: <Functional name=functional_3, built=True> 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32] 
Transformer for series: MinMaxScaler() 
Window size: 32 
Target series, levels: ['pm2.5', 'co', 'no', 'o3'] 
Multivariate series (names): None 
Maximum steps predicted: [1 2 3 4 5] 
Training range: None 
Training index type: None 
Training index frequency: None 
Model parameters: {'name': 'functional_3', 'trainable': True, 'layers': [{'module': 'keras.layers', 'class_name': 'InputLayer', 'config': {'batch_shape': (None, 32, 10), 'dtype': 'float32', 'sparse': False, 'name': 'input_layer_3'}, 'registered_name': None, 'name': 'input_layer_3', 'inbound_nodes': []}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm_4', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'return_sequences': True, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'zero_output_for_mask': False, 'units': 100, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'OrthogonalInitializer', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'seed': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 10)}, 'name': 'lstm_4', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32, 10), 'dtype': 'float32', 'keras_history': ['input_layer_3', 0, 0]}},), 'kwargs': {'training': False, 'mask': None}}]}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm_5', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'return_sequences': False, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'zero_output_for_mask': False, 'units': 50, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'OrthogonalInitializer', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'seed': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 100)}, 'name': 'lstm_5', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32, 100), 'dtype': 'float32', 'keras_history': ['lstm_4', 0, 0]}},), 'kwargs': {'training': False, 'mask': None}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_7', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 64, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 50)}, 'name': 'dense_7', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 50), 'dtype': 'float32', 'keras_history': ['lstm_5', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_8', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 32, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 64)}, 'name': 'dense_8', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 64), 'dtype': 'float32', 'keras_history': ['dense_7', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_9', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'units': 20, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32)}, 'name': 'dense_9', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 32), 'dtype': 'float32', 'keras_history': ['dense_8', 0, 0]}},), 'kwargs': {}}]}, {'module': 'keras.layers', 'class_name': 'Reshape', 'config': {'name': 'reshape_3', 'trainable': True, 'dtype': {'module': 'keras', 'class_name': 'DTypePolicy', 'config': {'name': 'float32'}, 'registered_name': None}, 'target_shape': (5, 4)}, 'registered_name': None, 'build_config': {'input_shape': (None, 20)}, 'name': 'reshape_3', 'inbound_nodes': [{'args': ({'class_name': '__keras_tensor__', 'config': {'shape': (None, 20), 'dtype': 'float32', 'keras_history': ['dense_9', 0, 0]}},), 'kwargs': {}}]}], 'input_layers': [['input_layer_3', 0, 0]], 'output_layers': [['reshape_3', 0, 0]]} 
Compile parameters: {'optimizer': {'module': 'keras.optimizers', 'class_name': 'Adam', 'config': {'name': 'adam', 'learning_rate': 0.009999999776482582, 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'loss_scale_factor': None, 'gradient_accumulation_steps': None, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}, 'registered_name': None}, 'loss': {'module': 'keras.losses', 'class_name': 'MeanSquaredError', 'config': {'name': 'mean_squared_error', 'reduction': 'sum_over_batch_size'}, 'registered_name': None}, 'loss_weights': None, 'metrics': None, 'weighted_metrics': None, 'run_eagerly': False, 'steps_per_execution': 1, 'jit_compile': False} 
fit_kwargs: {'epochs': 100, 'batch_size': 128, 'callbacks': [<keras.src.callbacks.early_stopping.EarlyStopping object at 0x7f5211aeade0>]} 
Creation date: 2024-08-08 09:13:18 
Last fit date: None 
Skforecast version: 0.13.0 
Python version: 3.12.4 
Forecaster id: None 

The model is trained for 100 epochs with an EarlyStopping callback that stops training when the validation loss stops decreasing for 5 epochs (patience=5).

Warning

Training the model takes approximately 3 minutes on a computer with 8 cores, and the `EarlyStopping` stops training at epoch 11. These results may vary depending on the hardware used.
In [40]:
# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
Epoch 1/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 19s 94ms/step - loss: 0.0267 - val_loss: 0.0148
Epoch 2/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 12s 79ms/step - loss: 0.0069 - val_loss: 0.0103
Epoch 3/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 25s 105ms/step - loss: 0.0050 - val_loss: 0.0093
Epoch 4/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 19s 94ms/step - loss: 0.0046 - val_loss: 0.0094
Epoch 5/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 16s 101ms/step - loss: 0.0043 - val_loss: 0.0088
Epoch 6/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 16s 102ms/step - loss: 0.0042 - val_loss: 0.0089
Epoch 7/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 12s 78ms/step - loss: 0.0039 - val_loss: 0.0093
Epoch 8/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 15s 98ms/step - loss: 0.0040 - val_loss: 0.0083
Epoch 9/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 13s 83ms/step - loss: 0.0038 - val_loss: 0.0084
Epoch 10/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 12s 75ms/step - loss: 0.0038 - val_loss: 0.0077
Epoch 11/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 15s 94ms/step - loss: 0.0038 - val_loss: 0.0074
Epoch 12/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 18s 117ms/step - loss: 0.0035 - val_loss: 0.0072
Epoch 13/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 19s 105ms/step - loss: 0.0035 - val_loss: 0.0071
Epoch 14/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 17s 86ms/step - loss: 0.0035 - val_loss: 0.0073
Epoch 15/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 16s 104ms/step - loss: 0.0035 - val_loss: 0.0073
Epoch 16/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 14s 90ms/step - loss: 0.0033 - val_loss: 0.0073
Epoch 17/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 22s 98ms/step - loss: 0.0034 - val_loss: 0.0070
Epoch 18/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 21s 103ms/step - loss: 0.0033 - val_loss: 0.0070
Epoch 19/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 14s 90ms/step - loss: 0.0034 - val_loss: 0.0076
Epoch 20/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 13s 82ms/step - loss: 0.0033 - val_loss: 0.0070
Epoch 21/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 14s 89ms/step - loss: 0.0033 - val_loss: 0.0069
Epoch 22/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 11s 72ms/step - loss: 0.0033 - val_loss: 0.0071
Epoch 23/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 11s 71ms/step - loss: 0.0033 - val_loss: 0.0071
Epoch 24/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 11s 71ms/step - loss: 0.0032 - val_loss: 0.0071
Epoch 25/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 11s 72ms/step - loss: 0.0033 - val_loss: 0.0072
Epoch 26/100
154/154 ━━━━━━━━━━━━━━━━━━━━ 11s 74ms/step - loss: 0.0033 - val_loss: 0.0070
In [41]:
# Trainig and overfitting tracking
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)

There are slight signs of overfitting from epoch 11. This may be because the model is too complex for the problem being solved. Thanks to the Keras callback, the model stops at epoch 16, thus avoiding the model from overfitting further. A good practice would be to modify the architecture of the model to avoid overfitting. For example, the number of neurons in the recurrent and dense layers could be reduced, or a dropout layer could be added to regularize the model.

When the forecaster models multiple time series, by default, predictions are calculated for all of them. However, it is possible to specify the series for which predictions are to be made using the predict method.

In [42]:
# Prediction
# ==============================================================================
predictions = forecaster.predict()
predictions
Out[42]:
pm2.5 co no o3
2021-04-01 00:00:00 15.137792 0.119255 0.873360 37.526031
2021-04-01 01:00:00 14.892797 0.117388 1.141526 33.966305
2021-04-01 02:00:00 14.470027 0.115939 1.416235 31.234140
2021-04-01 03:00:00 14.113650 0.118444 1.663687 28.528336
2021-04-01 04:00:00 13.773868 0.116024 2.245604 24.723658

The prediction can also be made for specific steps, as long as they are within the prediction horizon defined in the model.

In [43]:
# Specific step predictions
# ==============================================================================
forecaster.predict(steps=[1, 5], levels="o3")
Out[43]:
o3
2021-04-01 00:00:00 37.526031
2021-04-01 04:00:00 24.723658
In [44]:
# Backtesting with test data
# ==============================================================================
metrics, predictions = backtesting_forecaster_multiseries(
    forecaster=forecaster,
    steps=forecaster.max_step,
    series=data,
    levels=forecaster.levels,
    initial_train_size=len(data.loc[:end_validation, :]),
    metric="mean_absolute_error",
    verbose=False,
    refit=False,
)
Epoch 1/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 15s 68ms/step - loss: 0.0032 - val_loss: 0.0070
Epoch 2/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 12s 64ms/step - loss: 0.0032 - val_loss: 0.0071
Epoch 3/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 12s 66ms/step - loss: 0.0031 - val_loss: 0.0068
Epoch 4/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 13s 70ms/step - loss: 0.0031 - val_loss: 0.0069
Epoch 5/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 14s 74ms/step - loss: 0.0031 - val_loss: 0.0068
Epoch 6/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 22s 82ms/step - loss: 0.0030 - val_loss: 0.0067
Epoch 7/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 15s 79ms/step - loss: 0.0031 - val_loss: 0.0067
Epoch 8/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 20s 74ms/step - loss: 0.0031 - val_loss: 0.0070
Epoch 9/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 16s 86ms/step - loss: 0.0030 - val_loss: 0.0072
Epoch 10/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 15s 77ms/step - loss: 0.0030 - val_loss: 0.0071
Epoch 11/100
188/188 ━━━━━━━━━━━━━━━━━━━━ 14s 75ms/step - loss: 0.0030 - val_loss: 0.0069
In [45]:
# Backtesting metrics
# ==============================================================================
metrics
Out[45]:
levels mean_absolute_error
0 pm2.5 3.535368
1 co 0.026137
2 no 2.763817
3 o3 10.797433
4 average 4.280689
5 weighted_average 4.280689
6 pooling 4.280689
In [46]:
# Plotting predictions vs real values in the test set
# =============================================================================
fig = px.line(
    data_frame = pd.concat([
                    predictions.melt(ignore_index=False).assign(group="predicciones"),
                    data_test[predictions.columns].melt(ignore_index=False).assign(group="test")
                ]).reset_index().rename(columns={"index": "date_time"}),
    x="date_time",
    y="value",
    facet_row="variable",
    color="group",
    title="Predictions vs real values in the test set",
)

fig.update_layout(
    title="Prediction vs real values in the test set",
    width=750,
    height=850,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1,
        xanchor="left",
        x=0
    )
)

fig.update_yaxes(matches=None)

The backtesting results show that the model is able to capture the pattern of the time series pm2.5 and O3, but not that of the series CO and NO. This may be because the former have a greater autoregressive behavior, while the future values of the latter do not seem to depend on past values.

It is observed that the backtesting error for the series O3 is higher than that obtained when only that series was modeled. This may be because the model is trying to model multiple time series at once, making the problem more complex.

Conclusion

  • Recurrent neural networks allow solving a wide variety of forecasting problems.

  • In the 1:1 and N:1 cases, the model is able to learn patterns of the time series and predict future values with a relative error with respect to the mean close to 6%.

  • In the N:M case, the model predicts some of the time series with a higher error than in the previous cases. This may be because some time series are more difficult to predict than others, or simply that the model is not good enough for the problem being tried to solve.

  • Deep learning models have high computational requirements.

  • To achieve a good deep learning model, it is necessary to find the right architecture, which requires knowledge and experience.

  • The more series are modeled, the easier it is for the model to learn the relationships between the series but it can lose precision in the individual prediction of each of them.

  • The use of skforecast allows to simplify the modeling process and speed up the prototyping and development process.

Session Info

In [47]:
import session_info
session_info.show(html=False)
-----
keras               3.4.1
matplotlib          3.9.1
numpy               1.26.4
pandas              2.2.2
plotly              5.23.0
session_info        1.0.0
skforecast          0.13.0
sklearn             1.5.1
tensorflow          2.17.0
-----
IPython             8.26.0
jupyter_client      8.6.2
jupyter_core        5.7.2
notebook            6.4.12
-----
Python 3.12.4 | packaged by Anaconda, Inc. | (main, Jun 18 2024, 15:12:24) [GCC 11.2.0]
Linux-5.15.0-1066-aws-x86_64-with-glibc2.31
-----
Session information updated at 2024-08-08 09:23

References

Dive into Deep Learning. Zhang, Aston and Lipton, Zachary C. and Li, Mu and Smola, Alexander J. (2023). Cambridge University Press. https://D2L.ai

© 2024 Codificando Bits https://www.codificandobits.com/blog/redes-neuronales-recurrentes-explicacion-detallada/

Hyndman, R.J., & Athanasopoulos, G. (2021) Forecasting: principles and practice, 3rd edition, OTexts: Melbourne, Australia.

Time Series Analysis and Forecasting with ADAM Ivan Svetunkov.

Joseph, M. (2022). Modern time series forecasting with Python: Explore industry-ready time series forecasting using modern machine learning and Deep Learning. Packt Publishing.

Cite as:

How to cite this document?

If you use this document or any part of it, we appreciate if you cite it. Thank you very much!

Deep Learning for time series prediction: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM) by Fernando Carazo and Joaquín Amat Rodrigo, available under the Attribution-NonCommercial-ShareAlike 4.0 International license at https://www.cienciadedatos.net/documentos/py54-forecasting-con-deep-learning.html

How to cite skforecast?

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2023). skforecast (v0.10.0). Zenodo. https://doi.org/10.5281/zenodo.8382788

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2023). skforecast (Version 0.10.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.10.0}, month = {9}, year = {2023}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }


Did you like the article? Your support is important

Website maintenance has high cost, your contribution will help me to continue generating free educational content. Many thanks! 😊


Creative Commons Licence
This document created by Fernando Carazo and Joaquín Amat Rodrigo is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.