Deep Learning for time series prediction: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

If you like  Skforecast ,  please give us a star on   GitHub! ⭐️

Deep Learning for time series prediction: Recurrent Neural Networks (RNN) and Long Short-Term Memory (LSTM)

Fernando Carazo Melo, Joaquín Amat Rodrigo
February 2024

Introduction

Deep Learning is a field of artificial intelligence focused on creating models based on neural networks that allow learning non-linear representations. Recurrent neural networks (RNN) are a type of deep learning architecture designed to work with sequential data, where information is propagated through recurrent connections, allowing the network to learn temporal dependencies.

This article describes how to train recurrent neural network models -specifically RNN and LSTM- for time series prediction (forecasting) using Python, TensorFlow and Skforecast.

  • Keras-TensorFlow provides, through its Keras module, a friendly interface to build and train neural network models. Thanks to its high-level API, developers can easily implement LSTM architectures, taking advantage of the computational efficiency and scalability offered by deep learning.

  • Skforecast eases the implementation and use of machine learning models -including LSTMs and RNNs- to forecasting problems. Using this package, the user can define the problem and abstract from the architecture. For advanced users, skforecast also allows to execute a previously defined deep learning architecture.

✎ Nota

To fully understand this article, some knowledge about neural networks and deep learning is presupposed. However, if this is not the case, and while we work on creating new material, we provide you with some reference links to start:

Recurrent Neural Networks (RNN)

Recurrent Neural Networks (RNN) are a type of neural networks designed to process data that follows a sequential order. In conventional neural networks, such as feedforward networks, information flows in one direction, from input to output through hidden layers, without considering the sequential structure of the data. In contrast, RNNs maintain internal states or memories, which allow them to remember past information and use it to predict future data in the sequence.

The basic unit of an RNN is the recurrent cell. This cell takes two inputs: the current input and the previous hidden state. The hidden state can be understood as a "memory" that retains information from previous iterations. The current input and the previous hidden state are combined to calculate the current output and the new hidden state. This output is used as input for the next iteration, along with the next input in the data sequence.

Despite the advances that have been achieved with RNN architectures, they have limitations to capture long-term patterns. This is why variants such as LSTM (Long Short-Term Memory) and GRU (Gated Recurrent Units) have been developed, which address these problems and allow long-term information to be retained more effectively.

RNN diagram. Source: James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning (1st ed.) [PDF]. Springer.

Long Short-Term Memory (LSTM)

Long Short-Term Memory (LSTM) neural networks are a specialized type of RNNs designed to overcome the limitations associated with capturing long-term temporal dependencies. Unlike traditional RNNs, LSTMs incorporate a more complex architecture, introducing memory units and gate mechanisms to improve information management over time.

Structure of LSTMs

LSTMs have a modular structure consisting of three fundamental gates: the forget gate, the input gate, and the output gate. These gates work together to regulate the flow of information through the memory unit, allowing for more precise control over what information to retain and what to forget.

  • Forget Gate: Regulates how much information should be forgotten and how much should be retained, combining the current input and the previous output through a sigmoid function.

  • Input Gate: Decides how much new information should be added to long-term memory.

  • Output Gate: Determines how much information from the current memory will be used for the final output, combining the current input and memory information through a sigmoid function.

Diagram of the inputs and outputs of an LSTM. Source: codificandobits https://databasecamp.de/wp-content/uploads/lstm-architecture-1024x709.png.

Tipos de problemas en el modelado de series temporales

The complexity of a time series problem is usually defined by three key factors: first, deciding which time series or series to use to train the model; second, determining what or how many time series are to be predicted; and third, defining the number of steps into the future that you want to predict. These three aspects can be a real challenge when addressing time series problems.

Recurrent neural networks, thanks to their wide variety of architectures, allow modeling the following scenarios:

  • Problems 1:1 - Model a single series and predict that same series (single-series, single-output)

    • Description: This type of problem involves modeling a time series using only its past. It is a typical autoregressive problem.
    • Example: Predicting daily temperature based on the temperature of the last few days.

  • Problems N:1 - Model a single series using multiple series (multi-series, single-output)

    • Description: These are problems in which multiple time series are used to predict a single series. Each series can represent a different entity or variable, but the output variable is only one of the series.
    • Example: Predicting daily temperature based on multiple series such as: temperature, humidity, and atmospheric pressure.

  • Problems N:M - Model multiple series using multiple series (multi-series, multiple-outputs)

    • Description: These problems consist of modeling and predicting future values of several time series at the same time.
    • Example: Forecasting stock values for several stocks based on historical stock data, energy prices, and commodities prices.

In all these scenarios, the prediction can be made single-step forecasting (one step into the future) or multi-step forecasting (multiple steps into the future). In the first case, the model only predicts a single value, while in the second, the model predicts multiple values into the future.

In some situations, it can be difficult to define and create the appropriate Deep Learning architecture to address a specific problem. The skforecast library provides functionalities that allow determining the appropriate Tensorflow architecture for each problem, simplifying and accelerating the modeling process for a wide variety of problems. Below is an example of how to use skforecast to solve each of the described time series problems using recurrent neural networks.

Data

The data used in this article contains detailed information on air quality in the city of Valencia (Spain). The data collection spans from January 1, 2019 to December 31, 2021, providing hourly measurements of various air pollutants, such as PM2.5 and PM10 particles, carbon monoxide (CO), nitrogen dioxide (NO2), among others. The data has been obtained from the Red de Vigilancia y Control de la Contaminación Atmosférica, 46250054-València - Centre, https://mediambient.gva.es/es/web/calidad-ambiental/datos-historicos platform.

Libraries

In [ ]:
# Data processing
# ==============================================================================
import pandas as pd
import numpy as np
from skforecast.datasets import fetch_dataset

# Plotting
# ==============================================================================
import matplotlib.pyplot as plt
from skforecast.plot import set_dark_theme
set_dark_theme()
import plotly.express as px
import plotly.graph_objects as go
import plotly.io as pio
import plotly.offline as poff
pio.templates.default = "seaborn"
poff.init_notebook_mode(connected=True)

# Tensorflow and Keras
# ==============================================================================
import tensorflow
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.losses import MeanSquaredError
from tensorflow.keras.callbacks import EarlyStopping

# Time series modeling
# ==============================================================================
import skforecast
from skforecast.ForecasterRnn import ForecasterRnn
from skforecast.ForecasterRnn.utils import create_and_compile_model
from sklearn.preprocessing import MinMaxScaler
from skforecast.model_selection_multiseries import backtesting_forecaster_multiseries

# Warning configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')

print(f"skforecast version: {skforecast.__version__}")
print(f"tensorflow version: {tensorflow.__version__}")
skforecast version: 0.12.0
tensorflow version: 2.15.1
In [2]:
# Downloading the dataset and processing it
# ==============================================================================
air_quality = fetch_dataset(name="air_quality_valencia")
air_quality_valencia
--------------------
Hourly measures of several air chemical pollutant (pm2.5, co, no, no2, pm10,
nox, o3, so2) at Valencia city.
 Red de Vigilancia y Control de la Contaminación Atmosférica, 46250054-València
- Centre, https://mediambient.gva.es/es/web/calidad-ambiental/datos-historicos.
Shape of the dataset: (26304, 10)
In [3]:
# Missing values imputation
# ==============================================================================
air_quality = air_quality.interpolate(method="linear")
air_quality = air_quality.sort_index()
air_quality.head()
Out[3]:
pm2.5 co no no2 pm10 nox o3 veloc. direc. so2
datetime
2019-01-01 00:00:00 19.0 0.2 3.0 36.0 22.0 40.0 16.0 0.5 262.0 8.0
2019-01-01 01:00:00 26.0 0.1 2.0 40.0 32.0 44.0 6.0 0.6 248.0 8.0
2019-01-01 02:00:00 31.0 0.1 11.0 42.0 36.0 58.0 3.0 0.3 224.0 8.0
2019-01-01 03:00:00 30.0 0.1 15.0 41.0 35.0 63.0 3.0 0.2 220.0 10.0
2019-01-01 04:00:00 30.0 0.1 16.0 39.0 36.0 63.0 3.0 0.4 221.0 11.0

It is verified that the data set has an index of type DatetimeIndex with hourly frequency. Although it is not necessary for the data to have this type of index to use skforecast, it is more advantageous for the subsequent use of the predictions.

In [4]:
# Checking the frequency of the time series
# ==============================================================================
print(f"Index: {air_quality.index.dtype}")
print(f"Frequency: {air_quality.index.freq}")
Index: datetime64[ns]
Frequency: <Hour>

To facilitate the training of the models, the search for optimal hyperparameters, and the evaluation of their predictive capacity, the data is divided into three separate sets: training, validation, and test.

In [5]:
# Split train-validation-test
# ==============================================================================
end_train = "2021-03-31 23:59:00"
end_validation = "2021-09-30 23:59:00"
air_quality_train = air_quality.loc[:end_train, :].copy()
air_quality_val = air_quality.loc[end_train:end_validation, :].copy()
air_quality_test = air_quality.loc[end_validation:, :].copy()

print(
    f"Dates train      : {air_quality_train.index.min()} --- " 
    f"{air_quality_train.index.max()}  (n={len(air_quality_train)})"
)
print(
    f"Dates validation : {air_quality_val.index.min()} --- " 
    f"{air_quality_val.index.max()}  (n={len(air_quality_val)})"
)
print(
    f"Dates test       : {air_quality_test.index.min()} --- " 
    f"{air_quality_test.index.max()}  (n={len(air_quality_test)})"
)
Dates train      : 2019-01-01 00:00:00 --- 2021-03-31 23:00:00  (n=19704)
Dates validation : 2021-04-01 00:00:00 --- 2021-09-30 23:00:00  (n=4392)
Dates test       : 2021-10-01 00:00:00 --- 2021-12-31 23:00:00  (n=2208)
In [6]:
# Plotting pm2.5
# ==============================================================================
fig, ax = plt.subplots(figsize=(8, 3))
air_quality_train["pm2.5"].rolling(100).mean().plot(ax=ax, label="train")
air_quality_val["pm2.5"].rolling(100).mean().plot(ax=ax, label="validation")
air_quality_test["pm2.5"].rolling(100).mean().plot(ax=ax, label="test")
ax.set_title("pm2.5")
ax.legend();

LSTM model and ForecasterRnn

Although tensorflow-keras facilitates the process of creating deep learning architectures, it is not always trivial to determine the dimensions that an LSTM model should have for forecasting, as these depend on how many time series are being modeled, how many are being predicted, and the length of the prediction horizon.

To improve the user experience and speed up the prototyping, development, and production process, skforecast has the create_and_compile_model function, with which, by indicating just a few arguments, the architecture is inferred and the model is created.

  • series: Time series to be used to train the model

  • levels: Time series to be predicted.

  • lags: Number of time steps to be used to predict the next value.
  • steps: Number of time steps to be predicted.
  • recurrent_layer: Type of recurrent layer to use. By default, an LSTM layer is used.
  • recurrent_units: Number of units in the recurrent layer. By default, 100 is used. If a list is passed, a recurrent layer will be created for each element in the list.
  • dense_units: Number of units in the dense layer. By default, 64 is used. If a list is passed, a dense layer will be created for each element in the list.
  • optimizer: Optimizer to use. By default, Adam with a learning rate of 0.01 is used.
  • loss: Loss function to use. By default, Mean Squared Error is used.

✎ Note

The `create_and_compile_model` function is designed to facilitate the creation of the Tensorflow model, however, more advanced users can create their own architectures as long as the input and output dimensions match the use case to which the model will be applied.

Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).

1:1 Problem - Model a single series and predict that same series

In this first scenario, we want to predict the concentration of $O_3$ for the next 1 and 5 days using only its own historical data. It is therefore a scenario in which a single time series is modeled using only its past values. This problem is also called autoregressive prediction.

One day ahead prediction (Single step forecasting)

First, a single-step forecast is made. To do this, a model is created using the create_and_compile_model function, which is passed as an argument to the ForecasterRnn class.

This is the simplest example of forecasting with recurrent neural networks. The model only needs a time series to train and predict. Therefore, the series argument of the create_and_compile_model function only needs a time series, the same one that is to be predicted (levels). In addition, since only a single value is to be predicted in the future, the steps argument is equal to 1.

In [7]:
# Create model
# ==============================================================================
series = ["o3"] # Series used as predictors
levels = ["o3"] # Target serie to predict
lags = 32 # Past time steps to be used to predict the target
steps = 1 # Future time steps to be predicted

data = air_quality[series].copy()
data_train = air_quality_train[series].copy()
data_val = air_quality_val[series].copy()
data_test = air_quality_test[series].copy()

model = create_and_compile_model(
    series=data_train,
    levels=levels, 
    lags=lags,
    steps=steps,
    recurrent_layer="LSTM",
    recurrent_units=4,
    dense_units=16,
    optimizer=Adam(learning_rate=0.01), 
    loss=MeanSquaredError()
)
model.summary()
Model: "model"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 32, 1)]           0         
                                                                 
 lstm (LSTM)                 (None, 4)                 96        
                                                                 
 dense (Dense)               (None, 16)                80        
                                                                 
 dense_1 (Dense)             (None, 1)                 17        
                                                                 
 reshape (Reshape)           (None, 1, 1)              0         
                                                                 
=================================================================
Total params: 193 (772.00 Byte)
Trainable params: 193 (772.00 Byte)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

In this case, a simple LSTM network is used, with a single recurrent layer with 4 neurons and a hidden dense layer with 16 neurons. The following table shows a detailed description of each layer:

Layer Type Output Shape Parameters Description
Input Layer (InputLayer) InputLayer (None, 32, 1) 0 This is the input layer of the model. It receives sequences of length 32, corresponding to the number of lags with a dimension at each time step.
LSTM Layer (Long Short-Term Memory) LSTM (None, 4) 96 The LSTM layer is a long and short-term memory layer that processes the input sequence. It has 4 LSTM units and connects to the next layer.
First Dense Layer (Dense) Dense (None, 16) 80 This is a fully connected layer with 16 units and uses a default activation function (relu) in the provided architecture.
Second Dense Layer (Dense) Dense (None, 1) 17 Another fully connected dense layer, this time with a single output unit. It also uses a default activation function.
Reshape Layer (Reshape) Reshape (None, 1, 1) 0 This layer reshapes the output of the previous dense layer to have a specific shape (None, 1, 1). This layer is not strictly necessary, but is included to make the module generalizable to other multi-output forecasting problems. The dimension of this output layer is (None, steps_to_predict_future, series_to_predict). In this case, steps=1 and levels="o3", so the dimension is (None, 1, 1)
Total Parameters and Trainable - - 193 Total Parameters: 193, Trainable Parameters: 193, Non-Trainable Parameters: 0

Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).

The forecaster is created from the model and the validation data is passed to it so that it can evaluate the model at each epoch. In addition, a MinMaxScaler object is passed to it to standardize the input and output data. This object will be responsible for scaling the input data and the predictions to their original scale.

The fit_kwargs are the arguments passed to the fit method of the model. In this case, the number of epochs, the batch size, the validation data, and a callback to stop training when the validation loss stops decreasing are passed to it.

In [8]:
# Forecaster Creation
# ==============================================================================
forecaster = ForecasterRnn(
    regressor=model,
    levels=levels,
    transformer_series=MinMaxScaler(),
    fit_kwargs={
        "epochs": 10,  # Number of epochs to train the model.
        "batch_size": 32,  # Batch size to train the model.
        "callbacks": [
            EarlyStopping(monitor="val_loss", patience=5)
        ],  # Callback to stop training when it is no longer learning.
        "series_val": data_val,  # Validation data for model training.
    },
)    

forecaster
/home/ubuntu/anaconda3/envs/skforecast_12_py11/lib/python3.11/site-packages/skforecast/ForecasterRnn/ForecasterRnn.py:227: UserWarning:

Setting `lags` = 'auto'. `lags` are inferred from the regressor architecture. Avoid the warning with lags=lags.

/home/ubuntu/anaconda3/envs/skforecast_12_py11/lib/python3.11/site-packages/skforecast/ForecasterRnn/ForecasterRnn.py:257: UserWarning:

`steps` default value = 'auto'. `steps` inferred from regressor architecture. Avoid the warning with steps=steps.

Out[8]:
============= 
ForecasterRnn 
============= 
Regressor: <keras.src.engine.functional.Functional object at 0x7f76e2f87510> 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32] 
Transformer for series: MinMaxScaler() 
Window size: 32 
Target series, levels: ['o3'] 
Multivariate series (names): None 
Maximum steps predicted: [1] 
Training range: None 
Training index type: None 
Training index frequency: None 
Model parameters: {'name': 'model', 'trainable': True, 'layers': [{'module': 'keras.layers', 'class_name': 'InputLayer', 'config': {'batch_input_shape': (None, 32, 1), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'input_1'}, 'registered_name': None, 'name': 'input_1', 'inbound_nodes': []}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm', 'trainable': True, 'dtype': 'float32', 'return_sequences': False, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'time_major': False, 'units': 4, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'Orthogonal', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'implementation': 2}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 1)}, 'name': 'lstm', 'inbound_nodes': [[['input_1', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense', 'trainable': True, 'dtype': 'float32', 'units': 16, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 4)}, 'name': 'dense', 'inbound_nodes': [[['lstm', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_1', 'trainable': True, 'dtype': 'float32', 'units': 1, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 16)}, 'name': 'dense_1', 'inbound_nodes': [[['dense', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Reshape', 'config': {'name': 'reshape', 'trainable': True, 'dtype': 'float32', 'target_shape': (1, 1)}, 'registered_name': None, 'build_config': {'input_shape': (None, 1)}, 'name': 'reshape', 'inbound_nodes': [[['dense_1', 0, 0, {}]]]}], 'input_layers': [['input_1', 0, 0]], 'output_layers': [['reshape', 0, 0]]} 
Compile parameters: {'optimizer': {'module': 'keras.optimizers', 'class_name': 'Adam', 'config': {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': False, 'is_legacy_optimizer': False, 'learning_rate': 0.009999999776482582, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}, 'registered_name': None}, 'loss': {'module': 'keras.losses', 'class_name': 'MeanSquaredError', 'config': {'reduction': 'auto', 'name': 'mean_squared_error', 'fn': 'mean_squared_error'}, 'registered_name': None}, 'metrics': None, 'loss_weights': None, 'weighted_metrics': None, 'run_eagerly': None, 'steps_per_execution': None, 'jit_compile': None} 
fit_kwargs: {'epochs': 10, 'batch_size': 32, 'callbacks': [<keras.src.callbacks.EarlyStopping object at 0x7f76e05e3990>]} 
Creation date: 2024-05-06 12:20:11 
Last fit date: None 
Skforecast version: 0.12.0 
Python version: 3.11.8 
Forecaster id: None 

Warning

The warning indicates that the number of lags has been inferred from the model architecture. In this case, the model has an LSTM layer with 32 neurons, so the number of lags is 32. If a different number of lags is desired, the `lags` argument can be specified in the `create_and_compile_model` function. To omit the warning, the `lags=lags` and `steps=steps` arguments can be specified in the initialization of the `ForecasterRnn`.
In [9]:
# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
Epoch 1/10
615/615 [==============================] - 11s 15ms/step - loss: 0.0104 - val_loss: 0.0059
Epoch 2/10
615/615 [==============================] - 9s 14ms/step - loss: 0.0056 - val_loss: 0.0057
Epoch 3/10
615/615 [==============================] - 9s 14ms/step - loss: 0.0055 - val_loss: 0.0054
Epoch 4/10
615/615 [==============================] - 8s 14ms/step - loss: 0.0054 - val_loss: 0.0061
Epoch 5/10
615/615 [==============================] - 9s 15ms/step - loss: 0.0053 - val_loss: 0.0055
Epoch 6/10
615/615 [==============================] - 9s 14ms/step - loss: 0.0052 - val_loss: 0.0054
Epoch 7/10
615/615 [==============================] - 9s 14ms/step - loss: 0.0053 - val_loss: 0.0054
Epoch 8/10
615/615 [==============================] - 9s 14ms/step - loss: 0.0052 - val_loss: 0.0055
Epoch 9/10
615/615 [==============================] - 8s 13ms/step - loss: 0.0053 - val_loss: 0.0054
Epoch 10/10
615/615 [==============================] - 9s 14ms/step - loss: 0.0053 - val_loss: 0.0060
In [10]:
# Track training and overfitting
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)

In deep learning models, it is very important to control overfitting. To do this, a Keras callback is used to stop training when the value of the cost function, on the validation data, stops decreasing. In this case, the callback does not stop training, as we have only trained for 10 epochs. If the number of epochs is increased, the callback will stop training when the validation loss stops decreasing.

On the other hand, another very useful tool is the plotting of the training and validation loss at each epoch. This allows you to visualize the behavior of the model and detect possible overfitting problems.

In the case of our model, it is observed that the training loss decreases rapidly in the first epoch, while the validation loss is low from the first epoch. From this it can be deduced:

  • The model is not overfitting, as the validation loss is similar to the training loss.

  • The validation error is calculated once the model is trained, so the first value of the validation loss in the first epoch is similar to the training loss in the second epoch.

Graphical explanation of overfitting. Source: https://datahacker.rs/018-pytorch-popular-techniques-to-prevent-the-overfitting-in-a-neural-networks/.

Once the forecaster has been trained, predictions can be obtained. In this case, it is a single value since only one step into the future (step) has been specified.

In [11]:
# Predictions
# ==============================================================================
predictions = forecaster.predict()
predictions
Out[11]:
o3
2021-04-01 42.740902

To obtain a robust estimate of the predictive capacity of the model, a backtesting process is performed. The backtesting process consists of generating a prediction for each observation in the test set, following the same procedure that would be followed if the model were in production, and finally comparing the predicted value with the actual value.

In [12]:
# Backtesting with test data
# ==============================================================================
metrics, predictions = backtesting_forecaster_multiseries(
    forecaster=forecaster,
    steps=forecaster.max_step,
    series=data,
    levels=forecaster.levels,
    initial_train_size=len(data.loc[:end_validation, :]), # Training + Validation Data
    metric="mean_absolute_error",
    verbose=False, # Set to True for detailed information
    refit=False,
)
Epoch 1/10
752/752 [==============================] - 10s 12ms/step - loss: 0.0052 - val_loss: 0.0053
Epoch 2/10
752/752 [==============================] - 8s 11ms/step - loss: 0.0052 - val_loss: 0.0056
Epoch 3/10
752/752 [==============================] - 8s 11ms/step - loss: 0.0052 - val_loss: 0.0055
Epoch 4/10
752/752 [==============================] - 8s 11ms/step - loss: 0.0052 - val_loss: 0.0054
Epoch 5/10
752/752 [==============================] - 8s 11ms/step - loss: 0.0053 - val_loss: 0.0055
Epoch 6/10
752/752 [==============================] - 9s 12ms/step - loss: 0.0052 - val_loss: 0.0057
In [13]:
# Backtesting predictions
# ==============================================================================
predictions
Out[13]:
o3
2021-10-01 00:00:00 51.252998
2021-10-01 01:00:00 56.829456
2021-10-01 02:00:00 60.468056
2021-10-01 03:00:00 60.452938
2021-10-01 04:00:00 49.530697
... ...
2021-12-31 19:00:00 13.119790
2021-12-31 20:00:00 12.002692
2021-12-31 21:00:00 15.273126
2021-12-31 22:00:00 14.688937
2021-12-31 23:00:00 16.162798

2208 rows × 1 columns

In [14]:
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
    title="Prediction vs real values in the test set",
    xaxis_title="Date time",
    yaxis_title="O3",
    width=800,
    height=350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.05,
        xanchor="left",
        x=0
    )
)
fig.show()
In [15]:
# Backtesting metrics
# ==============================================================================
metrics
Out[15]:
levels mean_absolute_error
0 o3 5.888
In [16]:
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
Serie mean: 54.52
Relative error (mae): 10.80 %

El modelo consigue un error de backtesting (mae) de 6.12, lo que se correspondiente con error relativo respecto a la media de la serie del 11.23%.

The model achieves a backtesting error (mae) of 5.9, which corresponds to a relative error with respect to the mean of the series of 10.88%.

Multi-step forecasting

The next case is to predict the next 5 values of O3 using only its historical data. It is therefore a scenario in which multiple future steps of a single time series are modeled using only its past values.

A similar architecture to the previous one will be used, but with a greater number of neurons in the LSTM layer and in the first dense layer. This will allow the model to have greater flexibility to model the time series.

In [17]:
# Model creation
# ==============================================================================
series = ["o3"] # Series used as predictors
levels = ["o3"] # Target serie to predict
lags = 32 # Past time steps to be used to predict the target
steps = 5 # Future time steps to be predicted

model = create_and_compile_model(
    series=data_train,
    levels=levels, 
    lags=lags,
    steps=steps,
    recurrent_layer="LSTM",
    recurrent_units=50,
    dense_units=32,
    optimizer=Adam(learning_rate=0.01), 
    loss=MeanSquaredError()
)
model.summary()
Model: "model_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_2 (InputLayer)        [(None, 32, 1)]           0         
                                                                 
 lstm_1 (LSTM)               (None, 50)                10400     
                                                                 
 dense_2 (Dense)             (None, 32)                1632      
                                                                 
 dense_3 (Dense)             (None, 5)                 165       
                                                                 
 reshape_1 (Reshape)         (None, 5, 1)              0         
                                                                 
=================================================================
Total params: 12197 (47.64 KB)
Trainable params: 12197 (47.64 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________
In [18]:
# Forecaster Creation
# ==============================================================================
forecaster = ForecasterRnn(
    regressor=model,
    levels=levels,
    transformer_series=MinMaxScaler(),
    fit_kwargs={
        "epochs": 10,  # Number of epochs to train the model.
        "batch_size": 32,  # Batch size to train the model.
        "callbacks": [
            EarlyStopping(monitor="val_loss", patience=5)
        ],  # Callback to stop training when it is no longer learning.
        "series_val": data_val,  # Validation data for model training.
    },
)    

forecaster
/home/ubuntu/anaconda3/envs/skforecast_12_py11/lib/python3.11/site-packages/skforecast/ForecasterRnn/ForecasterRnn.py:227: UserWarning:

Setting `lags` = 'auto'. `lags` are inferred from the regressor architecture. Avoid the warning with lags=lags.

/home/ubuntu/anaconda3/envs/skforecast_12_py11/lib/python3.11/site-packages/skforecast/ForecasterRnn/ForecasterRnn.py:257: UserWarning:

`steps` default value = 'auto'. `steps` inferred from regressor architecture. Avoid the warning with steps=steps.

Out[18]:
============= 
ForecasterRnn 
============= 
Regressor: <keras.src.engine.functional.Functional object at 0x7f7676aa6490> 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32] 
Transformer for series: MinMaxScaler() 
Window size: 32 
Target series, levels: ['o3'] 
Multivariate series (names): None 
Maximum steps predicted: [1 2 3 4 5] 
Training range: None 
Training index type: None 
Training index frequency: None 
Model parameters: {'name': 'model_1', 'trainable': True, 'layers': [{'module': 'keras.layers', 'class_name': 'InputLayer', 'config': {'batch_input_shape': (None, 32, 1), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'input_2'}, 'registered_name': None, 'name': 'input_2', 'inbound_nodes': []}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm_1', 'trainable': True, 'dtype': 'float32', 'return_sequences': False, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'time_major': False, 'units': 50, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'Orthogonal', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'implementation': 2}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 1)}, 'name': 'lstm_1', 'inbound_nodes': [[['input_2', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_2', 'trainable': True, 'dtype': 'float32', 'units': 32, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 50)}, 'name': 'dense_2', 'inbound_nodes': [[['lstm_1', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_3', 'trainable': True, 'dtype': 'float32', 'units': 5, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32)}, 'name': 'dense_3', 'inbound_nodes': [[['dense_2', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Reshape', 'config': {'name': 'reshape_1', 'trainable': True, 'dtype': 'float32', 'target_shape': (5, 1)}, 'registered_name': None, 'build_config': {'input_shape': (None, 5)}, 'name': 'reshape_1', 'inbound_nodes': [[['dense_3', 0, 0, {}]]]}], 'input_layers': [['input_2', 0, 0]], 'output_layers': [['reshape_1', 0, 0]]} 
Compile parameters: {'optimizer': {'module': 'keras.optimizers', 'class_name': 'Adam', 'config': {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': False, 'is_legacy_optimizer': False, 'learning_rate': 0.009999999776482582, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}, 'registered_name': None}, 'loss': {'module': 'keras.losses', 'class_name': 'MeanSquaredError', 'config': {'reduction': 'auto', 'name': 'mean_squared_error', 'fn': 'mean_squared_error'}, 'registered_name': None}, 'metrics': None, 'loss_weights': None, 'weighted_metrics': None, 'run_eagerly': None, 'steps_per_execution': None, 'jit_compile': None} 
fit_kwargs: {'epochs': 10, 'batch_size': 32, 'callbacks': [<keras.src.callbacks.EarlyStopping object at 0x7f7676d4fb10>]} 
Creation date: 2024-05-06 12:27:01 
Last fit date: None 
Skforecast version: 0.12.0 
Python version: 3.11.8 
Forecaster id: None 

✎ Note

The `fit_kwargs` parameter is very useful as it allows you to set any configuration in the model, in this case Keras. In the previous code, the number of training epochs (10) is defined with a batch size of 32. An `EarlyStopping` callback is configured to stop training when the validation loss stops decreasing for 5 epochs (`patience=5`). Other callbacks can also be configured, such as `ModelCheckpoint` to save the model at each epoch, or even Tensorboard to visualize the training and validation loss in real time.
In [19]:
# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
Epoch 1/10
615/615 [==============================] - 15s 21ms/step - loss: 0.0207 - val_loss: 0.0128
Epoch 2/10
615/615 [==============================] - 13s 20ms/step - loss: 0.0133 - val_loss: 0.0119
Epoch 3/10
615/615 [==============================] - 12s 19ms/step - loss: 0.0123 - val_loss: 0.0120
Epoch 4/10
615/615 [==============================] - 13s 21ms/step - loss: 0.0120 - val_loss: 0.0133
Epoch 5/10
615/615 [==============================] - 13s 21ms/step - loss: 0.0119 - val_loss: 0.0112
Epoch 6/10
615/615 [==============================] - 12s 20ms/step - loss: 0.0116 - val_loss: 0.0127
Epoch 7/10
615/615 [==============================] - 12s 20ms/step - loss: 0.0114 - val_loss: 0.0118
Epoch 8/10
615/615 [==============================] - 12s 20ms/step - loss: 0.0113 - val_loss: 0.0114
Epoch 9/10
615/615 [==============================] - 12s 20ms/step - loss: 0.0112 - val_loss: 0.0111
Epoch 10/10
615/615 [==============================] - 12s 20ms/step - loss: 0.0114 - val_loss: 0.0111
In [20]:
# Train and overfitting tracking
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)

It is anticipated that the prediction will be of lower quality than in the previous case, as the error observed in the different epochs is higher. This has a simple explanation, and that is that the model has to predict 5 values instead of 1. Therefore, the validation error is higher since the loss of 5 values is being calculated instead of 1.

The prediction is made. In this case, there are 5 values since 5 steps into the future (step) have been specified.

In [21]:
# Prediction
# ==============================================================================
predictions = forecaster.predict()
predictions
Out[21]:
o3
2021-04-01 00:00:00 42.525261
2021-04-01 01:00:00 38.580078
2021-04-01 02:00:00 36.230850
2021-04-01 03:00:00 34.423199
2021-04-01 04:00:00 33.650982

Specific steps can be predicted, as long as they are within the prediction horizon defined in the model.

In [22]:
# Specific step predictions
# ==============================================================================
predictions = forecaster.predict(steps=[1, 3])
predictions
Out[22]:
o3
2021-04-01 00:00:00 42.525261
2021-04-01 02:00:00 36.230850
In [23]:
# Backtesting 
# ==============================================================================
metrics, predictions = backtesting_forecaster_multiseries(
    forecaster=forecaster,
    steps=forecaster.max_step,
    series=data,
    levels=forecaster.levels,
    initial_train_size=len(data.loc[:end_validation, :]),
    metric="mean_absolute_error",
    verbose=False,
    refit=False,
)
Epoch 1/10
752/752 [==============================] - 16s 20ms/step - loss: 0.0110 - val_loss: 0.0107
Epoch 2/10
752/752 [==============================] - 15s 20ms/step - loss: 0.0111 - val_loss: 0.0115
Epoch 3/10
752/752 [==============================] - 15s 19ms/step - loss: 0.0110 - val_loss: 0.0110
Epoch 4/10
752/752 [==============================] - 15s 20ms/step - loss: 0.0111 - val_loss: 0.0108
Epoch 5/10
752/752 [==============================] - 15s 20ms/step - loss: 0.0110 - val_loss: 0.0110
Epoch 6/10
752/752 [==============================] - 15s 20ms/step - loss: 0.0109 - val_loss: 0.0111
In [24]:
# Backtesting predictions
# ==============================================================================
predictions
Out[24]:
o3
2021-10-01 00:00:00 57.558262
2021-10-01 01:00:00 55.990757
2021-10-01 02:00:00 51.385490
2021-10-01 03:00:00 47.830757
2021-10-01 04:00:00 44.054672
... ...
2021-12-31 19:00:00 18.582367
2021-12-31 20:00:00 16.290226
2021-12-31 21:00:00 15.990602
2021-12-31 22:00:00 18.501095
2021-12-31 23:00:00 18.051147

2208 rows × 1 columns

In [25]:
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
    title="Prediction vs real values in the test set",
    xaxis_title="Date time",
    yaxis_title="O3",
    width=800,
    height=350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.05,
        xanchor="left",
        x=0
    )
)
fig.show()
In [26]:
# Backtesting metrics
# ==============================================================================
metrics
Out[26]:
levels mean_absolute_error
0 o3 9.422398
In [27]:
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
Serie mean: 54.52
Relative error (mae): 17.28 %

In this case, the prediction is worse than in the previous case. This is to be expected since the model has to predict 5 values instead of 1.

N:1 Problems - Multiple time series with single output

In this case, the same series will be predicted, but using multiple time series as predictors. It is therefore a scenario in which past values of multiple time series are used to predict a single time series.

These types of approaches are very useful when multiple time series related to each other are available. For example, in the case of temperature prediction, multiple time series such as humidity, atmospheric pressure, wind speed, etc. can be used.

In this type of problem, the architecture of the neural network is more complex, an additional recurrent dense layer is needed to process the multiple input series. In addition, another hidden dense layer is added to process the output of the recurrent layer. As can be seen, creating the model using skforecast is very simple, simply pass a list of integers to the recurrent_units and dense_units arguments to create multiple recurrent and dense layers.

In [28]:
# Creación del modelo
# ==============================================================================
# Time series used in the training. Now, it is multiseries
series = ['pm2.5', 'co', 'no', 'no2', 'pm10', 'nox', 'o3', 'veloc.', 'direc.','so2'] 
levels = ["o3"] 
lags = 32 
steps = 5 

data = air_quality[series].copy()
data_train = air_quality_train[series].copy()
data_val = air_quality_val[series].copy()
data_test = air_quality_test[series].copy()

model = create_and_compile_model(
    series=data_train,
    levels=levels, 
    lags=lags,
    steps=steps,
    recurrent_layer="LSTM",
    recurrent_units=[100, 50],
    dense_units=[64, 32],
    optimizer=Adam(learning_rate=0.01), 
    loss=MeanSquaredError()
)
model.summary()
Model: "model_2"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_3 (InputLayer)        [(None, 32, 10)]          0         
                                                                 
 lstm_2 (LSTM)               (None, 32, 100)           44400     
                                                                 
 lstm_3 (LSTM)               (None, 50)                30200     
                                                                 
 dense_4 (Dense)             (None, 64)                3264      
                                                                 
 dense_5 (Dense)             (None, 32)                2080      
                                                                 
 dense_6 (Dense)             (None, 5)                 165       
                                                                 
 reshape_2 (Reshape)         (None, 5, 1)              0         
                                                                 
=================================================================
Total params: 80109 (312.93 KB)
Trainable params: 80109 (312.93 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

Once the model has been created and compiled, the next step is to create an instance of ForecasterRnn. This class is responsible for adding to the deep learning model all the functionalities necessary to be used in forecasting problems. It is also compatible with the rest of the functionalities offered by skforecast (backtesting, hyperparameter search, ...).

In [29]:
# Forecaster creation
# ==============================================================================
forecaster = ForecasterRnn(
    regressor=model,
    levels=levels,
    steps=steps,
    lags=lags,
    transformer_series=MinMaxScaler(),
    fit_kwargs={
        "epochs": 4,  
        "batch_size": 128,  
        "series_val": data_val,
    },
)
forecaster
Out[29]:
============= 
ForecasterRnn 
============= 
Regressor: <keras.src.engine.functional.Functional object at 0x7f76765808d0> 
Lags: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31 32] 
Transformer for series: MinMaxScaler() 
Window size: 32 
Target series, levels: ['o3'] 
Multivariate series (names): None 
Maximum steps predicted: [1 2 3 4 5] 
Training range: None 
Training index type: None 
Training index frequency: None 
Model parameters: {'name': 'model_2', 'trainable': True, 'layers': [{'module': 'keras.layers', 'class_name': 'InputLayer', 'config': {'batch_input_shape': (None, 32, 10), 'dtype': 'float32', 'sparse': False, 'ragged': False, 'name': 'input_3'}, 'registered_name': None, 'name': 'input_3', 'inbound_nodes': []}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm_2', 'trainable': True, 'dtype': 'float32', 'return_sequences': True, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'time_major': False, 'units': 100, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'Orthogonal', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'implementation': 2}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 10)}, 'name': 'lstm_2', 'inbound_nodes': [[['input_3', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'LSTM', 'config': {'name': 'lstm_3', 'trainable': True, 'dtype': 'float32', 'return_sequences': False, 'return_state': False, 'go_backwards': False, 'stateful': False, 'unroll': False, 'time_major': False, 'units': 50, 'activation': 'relu', 'recurrent_activation': 'sigmoid', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'recurrent_initializer': {'module': 'keras.initializers', 'class_name': 'Orthogonal', 'config': {'gain': 1.0, 'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'unit_forget_bias': True, 'kernel_regularizer': None, 'recurrent_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'recurrent_constraint': None, 'bias_constraint': None, 'dropout': 0.0, 'recurrent_dropout': 0.0, 'implementation': 2}, 'registered_name': None, 'build_config': {'input_shape': (None, 32, 100)}, 'name': 'lstm_3', 'inbound_nodes': [[['lstm_2', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_4', 'trainable': True, 'dtype': 'float32', 'units': 64, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 50)}, 'name': 'dense_4', 'inbound_nodes': [[['lstm_3', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_5', 'trainable': True, 'dtype': 'float32', 'units': 32, 'activation': 'relu', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 64)}, 'name': 'dense_5', 'inbound_nodes': [[['dense_4', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Dense', 'config': {'name': 'dense_6', 'trainable': True, 'dtype': 'float32', 'units': 5, 'activation': 'linear', 'use_bias': True, 'kernel_initializer': {'module': 'keras.initializers', 'class_name': 'GlorotUniform', 'config': {'seed': None}, 'registered_name': None}, 'bias_initializer': {'module': 'keras.initializers', 'class_name': 'Zeros', 'config': {}, 'registered_name': None}, 'kernel_regularizer': None, 'bias_regularizer': None, 'activity_regularizer': None, 'kernel_constraint': None, 'bias_constraint': None}, 'registered_name': None, 'build_config': {'input_shape': (None, 32)}, 'name': 'dense_6', 'inbound_nodes': [[['dense_5', 0, 0, {}]]]}, {'module': 'keras.layers', 'class_name': 'Reshape', 'config': {'name': 'reshape_2', 'trainable': True, 'dtype': 'float32', 'target_shape': (5, 1)}, 'registered_name': None, 'build_config': {'input_shape': (None, 5)}, 'name': 'reshape_2', 'inbound_nodes': [[['dense_6', 0, 0, {}]]]}], 'input_layers': [['input_3', 0, 0]], 'output_layers': [['reshape_2', 0, 0]]} 
Compile parameters: {'optimizer': {'module': 'keras.optimizers', 'class_name': 'Adam', 'config': {'name': 'Adam', 'weight_decay': None, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': False, 'is_legacy_optimizer': False, 'learning_rate': 0.009999999776482582, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}, 'registered_name': None}, 'loss': {'module': 'keras.losses', 'class_name': 'MeanSquaredError', 'config': {'reduction': 'auto', 'name': 'mean_squared_error', 'fn': 'mean_squared_error'}, 'registered_name': None}, 'metrics': None, 'loss_weights': None, 'weighted_metrics': None, 'run_eagerly': None, 'steps_per_execution': None, 'jit_compile': None} 
fit_kwargs: {'epochs': 4, 'batch_size': 128} 
Creation date: 2024-05-06 12:31:25 
Last fit date: None 
Skforecast version: 0.12.0 
Python version: 3.11.8 
Forecaster id: None 
In [30]:
# Fit forecaster
# ==============================================================================
forecaster.fit(data_train)
Epoch 1/4
154/154 [==============================] - 19s 102ms/step - loss: 0.0323 - val_loss: 0.0207
Epoch 2/4
154/154 [==============================] - 16s 104ms/step - loss: 0.0135 - val_loss: 0.0147
Epoch 3/4
154/154 [==============================] - 16s 105ms/step - loss: 0.0115 - val_loss: 0.0117
Epoch 4/4
154/154 [==============================] - 16s 105ms/step - loss: 0.0104 - val_loss: 0.0119
In [31]:
# Trainig and overfitting tracking
# ==============================================================================
fig, ax = plt.subplots(figsize=(5, 2.5))
forecaster.plot_history(ax=ax)
In [32]:
# Prediction
# ==============================================================================
predictions = forecaster.predict()
predictions
Out[32]:
o3
2021-04-01 00:00:00 47.043148
2021-04-01 01:00:00 42.332268
2021-04-01 02:00:00 35.514668
2021-04-01 03:00:00 34.057709
2021-04-01 04:00:00 35.413418
In [33]:
# Backtesting with test data
# ==============================================================================
metrics, predictions = backtesting_forecaster_multiseries(
    forecaster=forecaster,
    steps=forecaster.max_step,
    series=data,
    levels=forecaster.levels,
    initial_train_size=len(data.loc[:end_validation, :]), # Datos de entrenamiento + validación
    metric="mean_absolute_error",
    verbose=False,
    refit=False,
)
Epoch 1/4
188/188 [==============================] - 23s 109ms/step - loss: 0.0103 - val_loss: 0.0113
Epoch 2/4
188/188 [==============================] - 20s 105ms/step - loss: 0.0100 - val_loss: 0.0103
Epoch 3/4
188/188 [==============================] - 20s 105ms/step - loss: 0.0097 - val_loss: 0.0108
Epoch 4/4
188/188 [==============================] - 20s 105ms/step - loss: 0.0096 - val_loss: 0.0108
In [34]:
# Backtesting metrics
# ==============================================================================
metrics
Out[34]:
levels mean_absolute_error
0 o3 10.370962
In [35]:
# % Error vs series mean
# ==============================================================================
rel_mse = 100 * metrics.loc[0, 'mean_absolute_error'] / np.mean(data["o3"])
print(f"Serie mean: {np.mean(data['o3']):0.2f}")
print(f"Relative error (mae): {rel_mse:0.2f} %")
Serie mean: 54.52
Relative error (mae): 19.02 %
In [36]:
# Backtesting predictions
# ==============================================================================
predictions
Out[36]:
o3
2021-10-01 00:00:00 53.611782
2021-10-01 01:00:00 49.676094
2021-10-01 02:00:00 45.803886
2021-10-01 03:00:00 41.128014
2021-10-01 04:00:00 36.991653
... ...
2021-12-31 19:00:00 10.192718
2021-12-31 20:00:00 11.780981
2021-12-31 21:00:00 1.291789
2021-12-31 22:00:00 0.706463
2021-12-31 23:00:00 -3.550592

2208 rows × 1 columns

In [37]:
# Plotting predictions vs real values in the test set
# ==============================================================================
fig = go.Figure()
trace1 = go.Scatter(x=data_test.index, y=data_test['o3'], name="test", mode="lines")
trace2 = go.Scatter(x=predictions.index, y=predictions['o3'], name="predictions", mode="lines")
fig.add_trace(trace1)
fig.add_trace(trace2)
fig.update_layout(
    title="Prediction vs real values in the test set",
    xaxis_title="Date time",
    yaxis_title="O3",
    width=800,
    height=350,
    margin=dict(l=20, r=20, t=35, b=20),
    legend=dict(
        orientation="h",
        yanchor="top",
        y=1.05,
        xanchor="left",
        x=0
    )
)
fig.show()