More about forecasting in cienciadedatos.net


Introduction

The ETS (Error, Trend, Seasonality) framework is a powerful and flexible approach to exponential smoothing for time series forecasting. ETS models are characterized by three key components: the type of error (additive or multiplicative), the type of trend (none, additive, multiplicative, or damped), and the type of seasonality (none, additive, or multiplicative). This systematic framework provides a unified approach to exponential smoothing methods, encompassing classic techniques like simple exponential smoothing, Holt's linear method, and Holt-Winters seasonal methods.

Unlike models that rely on differencing transformations to achieve stationarity, ETS uses a state-space formulation where the level, trend, and seasonal components are recursively updated at each time step. This makes ETS models highly interpretable while maintaining flexibility to handle various patterns in time series data.

The ETS Framework

ETS models maintain internal state variables that evolve over time through smoothing equations:

  • Level ($\ell_t$): The baseline value of the series at time $t$.
  • Trend ($b_t$): The rate of change or growth pattern.
  • Seasonal ($s_t$): Repeating patterns with a fixed period $m$.

Each component can be modeled as additive or multiplicative, resulting in different model behaviors.

Error, Trend, and Seasonality Components

The model specification uses three-letter notation (e.g., "AAN", "MAM"):

First Letter - Error Type:

  • A (Additive): Errors are independent of the series level
  • M (Multiplicative): Errors scale proportionally with the series level

Second Letter - Trend Type:

  • N (None): No trend component
  • A (Additive): Linear trend
  • M (Multiplicative): Exponential growth trend
  • Add damping: Use damped=True to dampen the trend over time

Third Letter - Seasonal Type:

  • N (None): No seasonal pattern
  • A (Additive): Constant seasonal effect
  • M (Multiplicative): Seasonal effect proportional to level

ETS vs. ARIMA

While both methods aim to predict future values based on history, they approach the problem from fundamentally different angles.

Feature ARIMA (Auto-Regressive Integrated Moving Average) ETS (Error, Trend, Seasonality)
Approach Differencing + ARMA. Achieves stationarity through differencing, then fits AR and MA terms. State-Space Smoothing. Recursively updates level, trend, and seasonal states with exponential smoothing.
Model Form Linear combination of past values and errors (after differencing). Explicit state equations for level, trend, and seasonality with additive or multiplicative structure.
Automation Configurable. Requires selecting orders ($p, d, q$), though auto-selection methods (AIC, BIC) can assist. Fully Automated. Model selection ('ZZZ') systematically searches over all valid ETS models.

✏️ Note

The Python implementation of the ETS algorithm in skforecast follows the state-space framework described in Hyndman et al. (2008) and is based on the Julia package Durbyn.jl developed by Resul Akay.

ETS model theory

ETS models use a state-space framework with two core equations: an observation equation relating observations to states, and state transition equations describing how states evolve.

Additive Error State-Space Form

For additive error models, the state-space representation is:

Observation equation: $$Y_t = H x_{t-1} + \varepsilon_t$$

State equation: $$x_t = F x_{t-1} + G \varepsilon_t$$

where $\varepsilon_t \sim WN(0, \sigma^2)$ is white noise, $x_t$ is the state vector containing level, trend, and seasonal components, and $H$, $F$, $G$ are system matrices that depend on the specific ETS model.

Forecast mean and variance at horizon $h$:

$$\mu_n(h) = H F^{h-1} x_n$$$$v_n(h) = \sigma^2 \left(1 + \sum_{j=1}^{h-1} (H F^{j-1} G)^2\right)$$

Simple Exponential Smoothing (ANN)

For series with no trend or seasonality:

Innovations state-space form (additive errors): $$Y_t = \ell_{t-1} + \varepsilon_t$$ $$\ell_t = \ell_{t-1} + \alpha \varepsilon_t$$

where $\ell_t$ is the level at time $t$, $\alpha$ is the smoothing parameter (typically $0 < \alpha < 1$ in practice, though the admissible range is broader), and $\varepsilon_t \sim WN(0, \sigma^2)$.

Component form: $$\ell_t = \alpha Y_t + (1-\alpha) \ell_{t-1}$$

Forecast function: $$\hat{Y}_{n+h|n} = \ell_n \text{ for all } h \geq 1$$

Forecast variance: $$\sigma^2_h = \sigma^2[1 + \alpha^2(h-1)]$$

The forecast variance increases linearly with horizon $h$, reflecting growing uncertainty as we forecast further into the future.

Multiplicative Error Form (MNN): $$Y_t = \ell_{t-1}(1 + \varepsilon_t)$$ $$\ell_t = \ell_{t-1}(1 + \alpha \varepsilon_t)$$

Point forecasts are identical to the additive form, but prediction intervals scale with the level.

Holt's Linear Trend Method (AAN)

For series with additive trend:

Innovations state-space form (additive errors): $$Y_t = \ell_{t-1} + b_{t-1} + \varepsilon_t$$ $$\ell_t = \ell_{t-1} + b_{t-1} + \alpha \varepsilon_t$$ $$b_t = b_{t-1} + \beta \varepsilon_t$$

where $\ell_t$ is the level, $b_t$ is the trend, and $\varepsilon_t \sim WN(0, \sigma^2)$. The smoothing parameters $\alpha$ (for level) and $\beta$ (for trend) control how much weight is given to recent innovations.

Note on notation: In the innovations state-space form, $\beta$ appears directly. In the component form below, the trend smoothing is parameterized by $\beta^* = \beta/\alpha$, representing the ratio of trend to level smoothing.

Component form:

  • Level: $\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1} + b_{t-1})$
  • Trend: $b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) b_{t-1}$

Forecast function: $$\hat{Y}_{n+h|n} = \ell_n + h \cdot b_n$$

Forecast variance: $$\sigma^2_h = \sigma^2\left[1 + (h-1)\left\{\alpha^2 + \alpha\beta h + \frac{1}{6}\beta^2 h(2h-1)\right\}\right]$$

Damped Trend

Innovations state-space form (additive errors): $$Y_t = \ell_{t-1} + \phi b_{t-1} + \varepsilon_t$$ $$\ell_t = \ell_{t-1} + \phi b_{t-1} + \alpha \varepsilon_t$$ $$b_t = \phi b_{t-1} + \beta \varepsilon_t$$

where $\phi \in (0,1]$ is the damping parameter.

Component form:

  • Level: $\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})$
  • Trend: $b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) \phi b_{t-1}$

Forecast function: $$\hat{Y}_{n+h|n} = \ell_n + (\phi + \phi^2 + \cdots + \phi^h) b_n$$

The damping parameter controls how quickly the trend dampens:

  • $\phi = 1$: Standard Holt (no damping)
  • $\phi < 1$: Damped trend (trend flattens out in forecasts)

Advantages of damped trend:

  • More realistic long-term forecasts
  • Prevents unbounded linear extrapolation
  • Often improves forecast accuracy for horizons $h > 10$

Holt-Winters Seasonal Methods

Additive Seasonality (AAA) - Component form:

Forecast equation: $$\hat{Y}_{t+h|t} = \ell_t + hb_t + s_{t+h-m(k+1)}$$

where $k = \lfloor (h-1)/m \rfloor$ and $m$ is the seasonal period.

Smoothing equations: $$\ell_t = \alpha(Y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1} + b_{t-1})$$ $$b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) b_{t-1}$$ $$s_t = \gamma(Y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma) s_{t-m}$$

where $\alpha$, $\beta^*$, and $\gamma$ are smoothing parameters for level, trend, and seasonality respectively.

Multiplicative Seasonality (MAM) - Component form:

Forecast equation: $$\hat{Y}_{t+h|t} = (\ell_t + hb_t) s_{t+h-m(k+1)}$$

Smoothing equations: $$\ell_t = \alpha \frac{Y_t}{s_{t-m}} + (1-\alpha)(\ell_{t-1} + b_{t-1})$$ $$b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) b_{t-1}$$ $$s_t = \gamma \frac{Y_t}{(\ell_{t-1} + b_{t-1})} + (1-\gamma) s_{t-m}$$

Multiplicative Error Form

For multiplicative error models, the innovations state-space formulation has:

Observation: $$Y_t = \mu_t(1 + \varepsilon_t)$$

where $\mu_t$ is the one-step-ahead forecast and $\varepsilon_t \sim WN(0, \sigma^2)$.

Key property: Point forecasts are the same as additive-error models, but prediction intervals scale with the level.

Examples:

  • MNN (no trend, no seasonality): $$Y_t = \ell_{t-1}(1 + \varepsilon_t)$$ $$\ell_t = \ell_{t-1}(1 + \alpha \varepsilon_t)$$

  • MAN (additive trend): $$Y_t = (\ell_{t-1} + b_{t-1})(1 + \varepsilon_t)$$ $$\ell_t = (\ell_{t-1} + b_{t-1})(1 + \alpha \varepsilon_t)$$ $$b_t = b_{t-1} + \beta(\ell_{t-1} + b_{t-1}) \varepsilon_t$$

Admissible Parameter Space

For stability and forecastability, ETS models have admissible parameter regions:

ANN / MNN: $$0 < \alpha < 2$$

AAN / MAN: $$0 < \alpha < 2, \quad 0 < \beta < 4 - 2\alpha$$

ADN (damped additive trend): $$0 < \phi \leq 1, \quad 1 - \frac{1}{\phi} < \alpha < 1 + \frac{1}{\phi}$$ $$\alpha(\phi - 1) < \beta < (1 + \phi)(2 - \alpha)$$

In practice, $\alpha, \beta^*, \gamma$ are typically constrained to $(0,1)$ for conventional exponential smoothing behavior, though the admissible regions allow for broader ranges that still ensure stable forecasts.

Admissible regions do not depend on whether errors are additive or multiplicative.

Model Selection

ETS models are typically estimated by maximizing the likelihood function. For model selection, information criteria are used:

  • AIC (Akaike Information Criterion): $\text{AIC} = -2\log L + 2k$
  • AICc (Corrected AIC): $\text{AICc} = \text{AIC} + \frac{2k(k+1)}{n-k-1}$ (recommended for small samples)
  • BIC (Bayesian Information Criterion): $\text{BIC} = -2\log L + k\log n$

where $k$ is the number of parameters and $n$ is the number of observations.

The log-likelihood depends on the error type:

Additive errors: $$\log L = -\frac{n}{2}\log\left(\frac{1}{n}\sum_{t=1}^{n}e_t^2\right)$$

Multiplicative errors: $$\log L = -\frac{n}{2}\log\left(\frac{1}{n}\sum_{t=1}^{n}e_t^2\right) - \sum_{t=1}^{n}\log|\hat{y}_t|$$

Ref: Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D. (2008) Forecasting with exponential smoothing: the state space approach, Springer-Verlag: New York. exponentialsmoothing.net

Libraries and data

# Libraries
# ==============================================================================
import sys
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.stats import Ets
from skforecast.recursive import ForecasterStats
from skforecast.model_selection import TimeSeriesFold, backtesting_stats
from skforecast.datasets import fetch_dataset
from skforecast.plot import set_dark_theme, plot_prediction_intervals
from skforecast.utils import expand_index
# Data download
# ==============================================================================
data = fetch_dataset(name='fuel_consumption', raw=True)
data = data[['Fecha', 'Gasolinas']]
data = data.rename(columns={'Fecha':'date', 'Gasolinas':'litters'})
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.loc[:'1990-01-01 00:00:00']
data = data.asfreq('MS')
data = data['litters'].rename('y')
display(data.head(4))
╭──────────────────────────────── fuel_consumption ────────────────────────────────╮
│ Description:                                                                     │
│ Monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01.                 │
│                                                                                  │
│ Source:                                                                          │
│ Obtained from Corporación de Reservas Estratégicas de Productos Petrolíferos and │
│ Corporación de Derecho Público tutelada por el Ministerio para la Transición     │
│ Ecológica y el Reto Demográfico. https://www.cores.es/es/estadisticas            │
│                                                                                  │
│ URL:                                                                             │
│ https://raw.githubusercontent.com/skforecast/skforecast-                         │
│ datasets/main/data/consumos-combustibles-mensual.csv                             │
│                                                                                  │
│ Shape: 644 rows x 6 columns                                                      │
╰──────────────────────────────────────────────────────────────────────────────────╯
date
1969-01-01    166875.2129
1969-02-01    155466.8105
1969-03-01    184983.6699
1969-04-01    202319.8164
Freq: MS, Name: y, dtype: float64
# Train-test dates
# ==============================================================================
end_train = '1983-01-01 23:59:59'
print(
    f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()}  "
    f"(n={len(data.loc[:end_train])})"
)
print(
    f"Test dates  : {data.loc[end_train:].index.min()} --- {data.index.max()}  "
    f"(n={len(data.loc[end_train:])})"
)
data_train = data.loc[:end_train]
data_test  = data.loc[end_train:]

# Plot
# ==============================================================================
set_dark_theme()
fig, ax=plt.subplots(figsize=(7, 3))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
ax.set_title('Monthly fuel consumption in Spain')
ax.legend();
Train dates : 1969-01-01 00:00:00 --- 1983-01-01 00:00:00  (n=169)
Test dates  : 1983-02-01 00:00:00 --- 1990-01-01 00:00:00  (n=84)

ETS model

Skforecast provides the class Ets to facilitate the creation of ETS models in Python, allowing users to easily fit and forecast time series data using this approach.

The Ets class provides flexible control over model specification and parameter estimation through several key arguments:

Model Specification

  • model: Three-letter code specifying the model structure:

    • First letter (Error): A (Additive), M (Multiplicative), or Z (Auto-select)
    • Second letter (Trend): N (None), A (Additive), M (Multiplicative), or Z (Auto-select)
    • Third letter (Season): N (None), A (Additive), M (Multiplicative), or Z (Auto-select)
    • Examples: "ANN" (simple exponential smoothing), "AAN" (Holt's linear trend), "AAA" (additive Holt-Winters)
    • Use "ZZZ" for fully automatic model selection
  • m: Seasonal period (e.g., 12 for monthly data with yearly seasonality, 4 for quarterly)

  • damped: Enable damped trend to prevent unbounded extrapolation

    • True: Use damped trend
    • False: Use standard (non-damped) trend
    • None: Try both when model="ZZZ" (automatic selection)

Fixed Parameters (optional)

If specified, these parameters are held fixed instead of being estimated:

  • alpha: Level smoothing parameter (0 < α < 1)
  • beta: Trend smoothing parameter (0 < β < α)
  • gamma: Seasonal smoothing parameter (0 < γ < 1-α)
  • phi: Damping parameter (0 < φ < 1)

Automatic Model Selection (model="ZZZ")

  • seasonal: Allow seasonal models in automatic selection
  • trend: Control trend in model search
    • None: Try both trending and non-trending models
    • True: Only try models with trend
    • False: Only try models without trend
  • ic: Information criterion for model selection
    • "aic": Akaike Information Criterion
    • "aicc": Corrected AIC (recommended for small samples)
    • "bic": Bayesian Information Criterion
  • allow_multiplicative: Allow multiplicative errors and seasonality
  • allow_multiplicative_trend: Allow multiplicative trend (generally not recommended)

Transformations

  • lambda_param: Box-Cox transformation parameter
    • None: No transformation
    • 0: Log transformation
    • 1: No transformation
    • Other values: Box-Cox transformation
  • lambda_auto: Automatically select optimal Box-Cox parameter
  • bias_adjust: Apply bias adjustment when back-transforming forecasts

Parameter Constraints

  • bounds: Type of parameter bounds
    • "usual": Traditional bounds (0 < α, β*, γ < 1)
    • "admissible": Broader stability bounds (e.g., 0 < α < 2 for ANN)
    • "both": Check both usual and admissible bounds

Common Configuration Examples

Use Case Configuration Description
Automatic (unrestricted) Ets(m=12, model="ZZZ") Fully automatic selection from all valid models
Automatic (conservative) Ets(m=12, model="ZZZ", allow_multiplicative=False) Only additive error and seasonality models
Simple exponential smoothing Ets(model="ANN") No trend, no seasonality (flat forecast)
Holt's linear trend Ets(model="AAN") Additive trend, no seasonality
Damped trend Ets(model="AAN", damped=True) Dampened trend for conservative long-term forecasts
Additive Holt-Winters Ets(model="AAA", m=12) Additive trend and seasonality
Multiplicative Holt-Winters Ets(model="MAM", m=12) Multiplicative errors, additive trend, multiplicative seasonality
Seasonal no trend Ets(model="ANA", m=12) Additive seasonality without trend
Non-seasonal with Box-Cox Ets(model="AAN", lambda_auto=True) Automatic variance stabilization transformation
Fixed smoothing parameters Ets(model="AAN", alpha=0.2, beta=0.1) Manual parameter specification (not estimated)
# ETS with a specific model configuration
# ==============================================================================
model = Ets(m=12, model="AAA")
model.fit(y=data_train)
model.summary()
ETS Model Summary
============================================================
Model: Ets(AAA)
Seasonal period (m): 12

Smoothing parameters:
  alpha (level):       0.1000
  beta (trend):        0.0100
  gamma (seasonal):    0.0100

Initial states:
  Level (l0):          197637.6942
  Trend (b0):          2251.9201

Model fit statistics:
  sigma^2:             294326220.344355
  Log-likelihood:      -1639.36
  AIC:                 3312.72
  BIC:                 3365.93

Residual statistics:
  Mean:                -1152.867362
  Std Dev:             16331.260792
  MAE:                 12418.719387
  RMSE:                16323.633666

Time Series Summary Statistics:
Number of observations: 169
  Mean:                 384743.1773
  Std Dev:              108126.6689
  Min:                  155466.8105
  25%:                  303667.7591
  Median:               397278.0241
  75%:                  466194.3073
  Max:                  605073.0143

AutoETS

When model is set to "ZZZ" or None, the ETS class performs automatic model selection following this process:

  1. Candidate Model Generation: Constructs a set of candidate models by combining different error types (A/M), trend types (N/A/M), and seasonal types (N/A/M), subject to constraints:

    • By default, multiplicative trend is excluded (allow_multiplicative_trend=False) as it can be numerically unstable
    • Certain combinations are invalid (e.g., additive error with multiplicative components)
    • If data has fewer observations than the seasonal period m, only non-seasonal models are considered
    • For high-frequency data (m > 24), seasonality is automatically disabled
  2. Model Estimation: Each candidate model is fitted to the data using maximum likelihood estimation with parameter bounds checking (bounds="both" by default, ensuring both usual and admissible constraints)

  3. Model Evaluation: Models are ranked using the specified information criterion:

    • AICc (default): Corrected AIC, recommended for small samples: $\text{AICc} = \text{AIC} + \frac{2k(k+1)}{n-k-1}$
    • AIC: Akaike Information Criterion: $\text{AIC} = -2\log L + 2k$
    • BIC: Bayesian Information Criterion: $\text{BIC} = -2\log L + k\log n$
  4. Best Model Selection: Returns the model with the lowest information criterion value. If the data shows evidence of trend (>10% change between first and second half), models without trend receive a penalty to prefer trending models.

The trend, seasonal, damped, and allow_multiplicative parameters control which candidate models are considered, allowing you to restrict the search space based on domain knowledge.

💡 Tip

The implementation of ETS in skforecast uses numba to optimize performance. Because the first call to fit() or predict() triggers code compilation, it may take longer than subsequent calls. After this initial compilation, performance improves significantly. While this overhead can be inconvenient during interactive sessions, it provides substantial benefits in production environments where thousands of models may be fitted and forecasted efficiently.

# AutoETS: automatic model selection
# ==============================================================================
model = Ets(m=12, model=None)
model.fit(y=data_train)
model.summary()
ETS Model Summary
============================================================
Model: Ets(MAM)
Seasonal period (m): 12

Smoothing parameters:
  alpha (level):       0.1000
  beta (trend):        0.0100
  gamma (seasonal):    0.0100

Initial states:
  Level (l0):          191275.6996
  Trend (b0):          2678.3040

Model fit statistics:
  sigma^2:             0.001604
  Log-likelihood:      -1614.33
  AIC:                 3262.66
  BIC:                 3315.87

Residual statistics:
  Mean:                -1587.582913
  Std Dev:             16096.366173
  MAE:                 11681.267343
  RMSE:                16127.006194

Time Series Summary Statistics:
Number of observations: 169
  Mean:                 384743.1773
  Std Dev:              108126.6689
  Min:                  155466.8105
  25%:                  303667.7591
  Median:               397278.0241
  75%:                  466194.3073
  Max:                  605073.0143

The automatic model selection identified ETS(MAM) as the optimal configuration for this fuel consumption time series. This model combines multiplicative errors, additive trend, and multiplicative seasonality with a 12-month period.

The multiplicative error component indicates that forecast uncertainty scales proportionally with the series level, which is appropriate for data where variability increases over time. The additive trend captures the linear growth pattern in fuel consumption (approximately 2,677 units per month based on the initial trend estimate). Most notably, the multiplicative seasonal component accounts for proportional seasonal variations—meaning that seasonal peaks and troughs grow larger as the overall consumption level increases. This is typical in economic and consumption data where seasonal effects compound with the underlying level.

The model's relatively low smoothing parameters (α=0.10, β=0.01, γ=0.01) suggest that recent observations have moderate influence on the forecasts, with the model relying heavily on the established patterns rather than reacting strongly to recent fluctuations.

Once the model is fitted, it can be used to forecast future observations. It is important to note that these types of models require predictions to follow immediately after the training data; therefore, the forecast starts right after the last observed value.

For performance reasons, predictions are returned as NumPy arrays. These can be easily converted into Pandas Series by mapping them to the corresponding time index.

# Predictions as pandas Series
# ==============================================================================
steps = len(data_test)
predictions = model.predict(steps=steps)

pred_index = expand_index(index=data_train.index, steps=steps)
predictions = pd.Series(predictions, index=pred_index)
predictions.head(4)
1983-02-01    408101.282195
1983-03-01    468889.543146
1983-04-01    490033.012053
1983-05-01    482048.495783
Freq: MS, dtype: float64
# Prediction interval
# ==============================================================================
predictions = model.predict_interval(steps=steps, level=[95])
predictions.index = pred_index
predictions.head(3)
mean lower_95 upper_95
1983-02-01 408101.282195 379112.149242 439283.248986
1983-03-01 468889.543146 430982.719071 505646.259513
1983-04-01 490033.012053 449928.041976 530495.775930

ForecasterStats

The previous section introduced the construction of ETS models. In order to seamlessly integrate these models with the various functionalities provided by skforecast, the next step is to encapsulate the skforecast Ets model within a ForecasterStats object. This encapsulation harmonizes the intricacies of the model and allows for the coherent use of skforecast's extensive capabilities.

Train

The train-prediction process follows an API similar to that of scikit-learn. More details in the ForecasterStats user guide.

# Create and fit ForecasterStats
# ==============================================================================
forecaster = ForecasterStats(estimator=Ets(m=12, model="MAM"))
forecaster.fit(y=data_train)
forecaster

ForecasterStats

General Information
  • Estimators:
    • skforecast.Ets: Ets(MAM)
  • Window size: 1
  • Series name: y
  • Exogenous included: False
  • Creation date: 2026-02-02 14:07:44
  • Last fit date: 2026-02-02 14:07:45
  • Skforecast version: 0.20.0
  • Python version: 3.13.11
  • Forecaster id: None
Exogenous Variables
    None
Data Transformations
  • Transformer for y: None
  • Transformer for exog: None
Training Information
  • Training range: [Timestamp('1969-01-01 00:00:00'), Timestamp('1983-01-01 00:00:00')]
  • Training index type: DatetimeIndex
  • Training index frequency: MS
Estimator Parameters
  • skforecast.Ets: {'m': 12, 'model': 'MAM', 'damped': None, 'alpha': None, 'beta': None, 'gamma': None, 'phi': None, 'seasonal': True, 'trend': None, 'allow_multiplicative': True, 'allow_multiplicative_trend': False}
Fit Kwargs
    None

🛈 API Reference    🗎 User Guide

Prediction

Once the model is fitted, it can be used to forecast future observations. It is important to note that these types of models require predictions to follow immediately after the training data; therefore, the forecast starts right after the last observed value.

# Predict
# ==============================================================================
steps = len(data_test)
predictions = forecaster.predict(steps=steps)
predictions.head(3)
1983-02-01    408101.282195
1983-03-01    468889.543146
1983-04-01    490033.012053
Freq: MS, Name: pred, dtype: float64

Prediction intervals

The method predict_interval enables the calculation of prediction intervals for the forecasted values. Users can specify the confidence level of the estimated interval using either the alpha or interval argument.

# Predict intervals
# ==============================================================================
predictions = forecaster.predict_interval(steps=steps, alpha=0.05)
predictions.head(3)
pred lower_bound upper_bound
1983-02-01 408101.282195 374425.191092 439908.674532
1983-03-01 468889.543146 434420.905502 507391.851633
1983-04-01 490033.012053 451247.496524 527738.119888
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
plot_prediction_intervals(
    predictions         = predictions,
    y_true              = data_test,
    target_variable     = "y",
    title               = "Prediction intervals in test data",
    kwargs_fill_between = {'color': 'white', 'alpha': 0.3, 'zorder': 1},
    ax                  = ax
)

Feature importances

The method get_feature_importances returns the estimated components of the ETS model, including level, trend, and seasonal effects. This provides insights into the contributions of each component to the overall forecast.

# Feature importances
# ==============================================================================
model.get_feature_importances()
feature importance
0 alpha (level) 0.10
1 beta (trend) 0.01
2 gamma (seasonal) 0.01

Backtesting

ETS and other statistical models, once integrated in a ForecasterStats object, can be evaluated using any of the backtesting strategies implemented in skforecast.

✏️ Note

Why do statistical models require refitting during backtesting?

Unlike machine learning models, statistical models like ARAR maintain an internal state that depends on the sequence of observations. They can only generate predictions starting from the last observed time step — they cannot "jump" to an arbitrary point in the future without knowing all previous values. During backtesting, when the validation window moves forward, the model must be refitted to incorporate the new observations and update its internal state. This is why refit=True is typically required. Performance optimization: Because refitting is mandatory, skforecast's Numba-optimized backend becomes essential. It enables hundreds of refits during backtesting in a fraction of the time required by non-optimized libraries.

# Create forecaster
# ==============================================================================
forecaster = ForecasterStats(estimator=Ets(m=12, model="MAM"))
# Backtesting
# ==============================================================================
cv = TimeSeriesFold(
    steps              = 12,
    initial_train_size = len(data_train),
    refit              = True,
)

metrics, predictions = backtesting_stats(
                        forecaster        = forecaster,
                        y                 = data,
                        cv                = cv,
                        metric            = ['mean_absolute_error', 'mean_absolute_percentage_error'],
                        suppress_warnings = True,
                        verbose           = True,
                     )
Information of folds
--------------------
Number of observations used for initial training: 169
Number of observations used for backtesting: 84
    Number of folds: 7
    Number skipped folds: 0 
    Number of steps per fold: 12
    Number of steps to exclude between last observed data (last window) and predictions (gap): 0

Fold: 0
    Training:   1969-01-01 00:00:00 -- 1983-01-01 00:00:00  (n=169)
    Validation: 1983-02-01 00:00:00 -- 1984-01-01 00:00:00  (n=12)
Fold: 1
    Training:   1970-01-01 00:00:00 -- 1984-01-01 00:00:00  (n=169)
    Validation: 1984-02-01 00:00:00 -- 1985-01-01 00:00:00  (n=12)
Fold: 2
    Training:   1971-01-01 00:00:00 -- 1985-01-01 00:00:00  (n=169)
    Validation: 1985-02-01 00:00:00 -- 1986-01-01 00:00:00  (n=12)
Fold: 3
    Training:   1972-01-01 00:00:00 -- 1986-01-01 00:00:00  (n=169)
    Validation: 1986-02-01 00:00:00 -- 1987-01-01 00:00:00  (n=12)
Fold: 4
    Training:   1973-01-01 00:00:00 -- 1987-01-01 00:00:00  (n=169)
    Validation: 1987-02-01 00:00:00 -- 1988-01-01 00:00:00  (n=12)
Fold: 5
    Training:   1974-01-01 00:00:00 -- 1988-01-01 00:00:00  (n=169)
    Validation: 1988-02-01 00:00:00 -- 1989-01-01 00:00:00  (n=12)
Fold: 6
    Training:   1975-01-01 00:00:00 -- 1989-01-01 00:00:00  (n=169)
    Validation: 1989-02-01 00:00:00 -- 1990-01-01 00:00:00  (n=12)

  0%|          | 0/7 [00:00<?, ?it/s]
# Backtest predictions
# ==============================================================================
predictions.head(4)
fold pred
1983-02-01 0 408101.282195
1983-03-01 0 468889.543146
1983-04-01 0 490033.012053
1983-05-01 0 482048.495783
# Backtest metrics
# ==============================================================================
metrics
mean_absolute_error mean_absolute_percentage_error
0 17010.270276 0.030752
# Plot backtest predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
data.loc[end_train:].plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax)
ax.set_title('Backtest predictions (folds of 12 months)')
ax.legend();

In-sample Predictions

Predictions on the training data are crucial for evaluating the accuracy and effectiveness of the model. By comparing the predicted values with the actual observed values in the training dataset, you can assess how well the model has learned the underlying patterns and trends in the data. This comparison helps in understanding the model's performance and identify areas where it may need improvement or adjustment. In essence, they act as a mirror, reflecting how the model interprets and reconstructs the historical data on which it was trained.

Predictions of the observations used to fit the model are stored in the fitted_values_ attribute of the ETS object.

# In-sample Predictions
# ==============================================================================
forecaster = ForecasterStats(estimator=Ets(m=12, model="MAM"))
forecaster.fit(y=data_train)
# Show only the first 5 values 
forecaster.estimators_[0].fitted_values_[:5]
array([167998.14060777, 162819.56599272, 188519.55126091, 199042.32235291,
       198515.09912727])

Memory optimization

For production environments where you need to store many fitted models but only require forecasting capabilities (not diagnostics), you can significantly reduce memory usage with the reduce_memory() method. This is especially useful when working with large datasets or deploying models in resource-constrained environments.

This method removes in-sample fitted values and residuals, which are only needed for diagnostic purposes but not for generating forecasts.

# Compare size before and after reduce_memory()
# ==============================================================================
def total_model_size(model):
    size = sys.getsizeof(model)
    for attr_name in dir(model):
        if attr_name.startswith('_'):
            continue
        try:
            attr = getattr(model, attr_name)
            size += sys.getsizeof(attr)
        except Exception:
            pass
    return size


forecaster = ForecasterStats(
                 estimator=Ets(m=12, model="MAM"),
             )
forecaster.fit(y=data_train)
model_size_before = total_model_size(forecaster.estimators_[0])
print(f"Memory before reduce_memory(): {model_size_before / 1024:.3f} KB")

# Reduce memory
forecaster.reduce_memory()
model_size_after = total_model_size(forecaster.estimators_[0])
print(f"Memory after reduce_memory(): {model_size_after / 1024:.3f} KB")
print(f"Memory reduction: {(1 - model_size_after / model_size_before) * 100:.1f}%")
Memory before reduce_memory(): 5.241 KB
Memory after reduce_memory(): 2.319 KB
Memory reduction: 55.7%
# Predictions still work after memory reduction
# ==============================================================================
forecaster.predict(steps=10)
1983-02-01    408101.282195
1983-03-01    468889.543146
1983-04-01    490033.012053
1983-05-01    482048.495783
1983-06-01    498540.830053
1983-07-01    606376.320233
1983-08-01    627071.555583
1983-09-01    513633.201604
1983-10-01    491399.448566
1983-11-01    437351.693039
Freq: MS, Name: pred, dtype: float64

Session information

import session_info
session_info.show(html=False)
-----
matplotlib          3.10.8
pandas              2.3.3
session_info        v1.0.1
skforecast          0.20.0
-----
IPython             9.8.0
jupyter_client      7.4.9
jupyter_core        5.9.1
notebook            6.5.7
-----
Python 3.13.11 | packaged by Anaconda, Inc. | (main, Dec 10 2025, 21:28:48) [GCC 14.3.0]
Linux-6.14.0-37-generic-x86_64-with-glibc2.39
-----
Session information updated at 2026-02-02 14:07

Citation

How to cite this document

If you use this document or any part of it, please acknowledge the source, thank you!

Exponential Smoothing models in Python by Joaquín Amat Rodrigo, Javier Escobar Ortiz and Resul Akay available under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) at https://cienciadedatos.net/documentos/py76-exponential-smoothing-models.html

How to cite skforecast

If you use skforecast for a publication, we would appreciate if you cite the published software.

Zenodo:

Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.20.0). Zenodo. https://doi.org/10.5281/zenodo.8382787

APA:

Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.20.0) [Computer software]. https://doi.org/10.5281/zenodo.8382787

BibTeX:

@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.20.0}, month = {01}, year = {2026}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }


Did you like the article? Your support is important

Your contribution will help me to continue generating free educational content. Many thanks! 😊

Become a GitHub Sponsor Become a GitHub Sponsor

Creative Commons Licence

This work by Joaquín Amat Rodrigo, Javier Escobar Ortiz and Resul Akay is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.

Allowed:

  • Share: copy and redistribute the material in any medium or format.

  • Adapt: remix, transform, and build upon the material.

Under the following terms:

  • Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

  • NonCommercial: You may not use the material for commercial purposes.

  • ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.