More about forecasting in cienciadedatos.net
- Time series forecasting with machine learning
- ARIMA and SARIMAX models
- Time series forecasting with gradient boosting
- Exponential smoothing models
- ARAR forecasting models
- Visualizing time series data
- Electricity demand forecasting
- Global forecasting models I
- Global forecasting models II
- Global forecasting models III
- Clustering time series
- Forecasting with deep learning models
- Probabilistic forecasting I
- Probabilistic forecasting II
- Probabilistic forecasting III
- Data drift detection in time series forecasting models
- Forecasting categorical time series
- Interpretability and explainability in forecasting models
Introduction¶
The ETS (Error, Trend, Seasonality) framework is a powerful and flexible approach to exponential smoothing for time series forecasting. ETS models are characterized by three key components: the type of error (additive or multiplicative), the type of trend (none, additive, multiplicative, or damped), and the type of seasonality (none, additive, or multiplicative). This systematic framework provides a unified approach to exponential smoothing methods, encompassing classic techniques like simple exponential smoothing, Holt's linear method, and Holt-Winters seasonal methods.
Unlike models that rely on differencing transformations to achieve stationarity, ETS uses a state-space formulation where the level, trend, and seasonal components are recursively updated at each time step. This makes ETS models highly interpretable while maintaining flexibility to handle various patterns in time series data.
The ETS Framework
ETS models maintain internal state variables that evolve over time through smoothing equations:
- Level ($\ell_t$): The baseline value of the series at time $t$.
- Trend ($b_t$): The rate of change or growth pattern.
- Seasonal ($s_t$): Repeating patterns with a fixed period $m$.
Each component can be modeled as additive or multiplicative, resulting in different model behaviors.
Error, Trend, and Seasonality Components
The model specification uses three-letter notation (e.g., "AAN", "MAM"):
First Letter - Error Type:
- A (Additive): Errors are independent of the series level
- M (Multiplicative): Errors scale proportionally with the series level
Second Letter - Trend Type:
- N (None): No trend component
- A (Additive): Linear trend
- M (Multiplicative): Exponential growth trend
- Add damping: Use
damped=Trueto dampen the trend over time
Third Letter - Seasonal Type:
- N (None): No seasonal pattern
- A (Additive): Constant seasonal effect
- M (Multiplicative): Seasonal effect proportional to level
ETS vs. ARIMA
While both methods aim to predict future values based on history, they approach the problem from fundamentally different angles.
| Feature | ARIMA (Auto-Regressive Integrated Moving Average) | ETS (Error, Trend, Seasonality) |
|---|---|---|
| Approach | Differencing + ARMA. Achieves stationarity through differencing, then fits AR and MA terms. | State-Space Smoothing. Recursively updates level, trend, and seasonal states with exponential smoothing. |
| Model Form | Linear combination of past values and errors (after differencing). | Explicit state equations for level, trend, and seasonality with additive or multiplicative structure. |
| Automation | Configurable. Requires selecting orders ($p, d, q$), though auto-selection methods (AIC, BIC) can assist. | Fully Automated. Model selection ('ZZZ') systematically searches over all valid ETS models. |
✏️ Note
The Python implementation of the ETS algorithm in skforecast follows the state-space framework described in Hyndman et al. (2008) and is based on the Julia package Durbyn.jl developed by Resul Akay.
ETS model theory¶
ETS models use a state-space framework with two core equations: an observation equation relating observations to states, and state transition equations describing how states evolve.
Additive Error State-Space Form¶
For additive error models, the state-space representation is:
Observation equation: $$Y_t = H x_{t-1} + \varepsilon_t$$
State equation: $$x_t = F x_{t-1} + G \varepsilon_t$$
where $\varepsilon_t \sim WN(0, \sigma^2)$ is white noise, $x_t$ is the state vector containing level, trend, and seasonal components, and $H$, $F$, $G$ are system matrices that depend on the specific ETS model.
Forecast mean and variance at horizon $h$:
$$\mu_n(h) = H F^{h-1} x_n$$$$v_n(h) = \sigma^2 \left(1 + \sum_{j=1}^{h-1} (H F^{j-1} G)^2\right)$$Simple Exponential Smoothing (ANN)¶
For series with no trend or seasonality:
Innovations state-space form (additive errors): $$Y_t = \ell_{t-1} + \varepsilon_t$$ $$\ell_t = \ell_{t-1} + \alpha \varepsilon_t$$
where $\ell_t$ is the level at time $t$, $\alpha$ is the smoothing parameter (typically $0 < \alpha < 1$ in practice, though the admissible range is broader), and $\varepsilon_t \sim WN(0, \sigma^2)$.
Component form: $$\ell_t = \alpha Y_t + (1-\alpha) \ell_{t-1}$$
Forecast function: $$\hat{Y}_{n+h|n} = \ell_n \text{ for all } h \geq 1$$
Forecast variance: $$\sigma^2_h = \sigma^2[1 + \alpha^2(h-1)]$$
The forecast variance increases linearly with horizon $h$, reflecting growing uncertainty as we forecast further into the future.
Multiplicative Error Form (MNN): $$Y_t = \ell_{t-1}(1 + \varepsilon_t)$$ $$\ell_t = \ell_{t-1}(1 + \alpha \varepsilon_t)$$
Point forecasts are identical to the additive form, but prediction intervals scale with the level.
Holt's Linear Trend Method (AAN)¶
For series with additive trend:
Innovations state-space form (additive errors): $$Y_t = \ell_{t-1} + b_{t-1} + \varepsilon_t$$ $$\ell_t = \ell_{t-1} + b_{t-1} + \alpha \varepsilon_t$$ $$b_t = b_{t-1} + \beta \varepsilon_t$$
where $\ell_t$ is the level, $b_t$ is the trend, and $\varepsilon_t \sim WN(0, \sigma^2)$. The smoothing parameters $\alpha$ (for level) and $\beta$ (for trend) control how much weight is given to recent innovations.
Note on notation: In the innovations state-space form, $\beta$ appears directly. In the component form below, the trend smoothing is parameterized by $\beta^* = \beta/\alpha$, representing the ratio of trend to level smoothing.
Component form:
- Level: $\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1} + b_{t-1})$
- Trend: $b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) b_{t-1}$
Forecast function: $$\hat{Y}_{n+h|n} = \ell_n + h \cdot b_n$$
Forecast variance: $$\sigma^2_h = \sigma^2\left[1 + (h-1)\left\{\alpha^2 + \alpha\beta h + \frac{1}{6}\beta^2 h(2h-1)\right\}\right]$$
Damped Trend¶
Innovations state-space form (additive errors): $$Y_t = \ell_{t-1} + \phi b_{t-1} + \varepsilon_t$$ $$\ell_t = \ell_{t-1} + \phi b_{t-1} + \alpha \varepsilon_t$$ $$b_t = \phi b_{t-1} + \beta \varepsilon_t$$
where $\phi \in (0,1]$ is the damping parameter.
Component form:
- Level: $\ell_t = \alpha Y_t + (1-\alpha)(\ell_{t-1} + \phi b_{t-1})$
- Trend: $b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) \phi b_{t-1}$
Forecast function: $$\hat{Y}_{n+h|n} = \ell_n + (\phi + \phi^2 + \cdots + \phi^h) b_n$$
The damping parameter controls how quickly the trend dampens:
- $\phi = 1$: Standard Holt (no damping)
- $\phi < 1$: Damped trend (trend flattens out in forecasts)
Advantages of damped trend:
- More realistic long-term forecasts
- Prevents unbounded linear extrapolation
- Often improves forecast accuracy for horizons $h > 10$
Holt-Winters Seasonal Methods¶
Additive Seasonality (AAA) - Component form:
Forecast equation: $$\hat{Y}_{t+h|t} = \ell_t + hb_t + s_{t+h-m(k+1)}$$
where $k = \lfloor (h-1)/m \rfloor$ and $m$ is the seasonal period.
Smoothing equations: $$\ell_t = \alpha(Y_t - s_{t-m}) + (1-\alpha)(\ell_{t-1} + b_{t-1})$$ $$b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) b_{t-1}$$ $$s_t = \gamma(Y_t - \ell_{t-1} - b_{t-1}) + (1-\gamma) s_{t-m}$$
where $\alpha$, $\beta^*$, and $\gamma$ are smoothing parameters for level, trend, and seasonality respectively.
Multiplicative Seasonality (MAM) - Component form:
Forecast equation: $$\hat{Y}_{t+h|t} = (\ell_t + hb_t) s_{t+h-m(k+1)}$$
Smoothing equations: $$\ell_t = \alpha \frac{Y_t}{s_{t-m}} + (1-\alpha)(\ell_{t-1} + b_{t-1})$$ $$b_t = \beta^*(\ell_t - \ell_{t-1}) + (1-\beta^*) b_{t-1}$$ $$s_t = \gamma \frac{Y_t}{(\ell_{t-1} + b_{t-1})} + (1-\gamma) s_{t-m}$$
Multiplicative Error Form¶
For multiplicative error models, the innovations state-space formulation has:
Observation: $$Y_t = \mu_t(1 + \varepsilon_t)$$
where $\mu_t$ is the one-step-ahead forecast and $\varepsilon_t \sim WN(0, \sigma^2)$.
Key property: Point forecasts are the same as additive-error models, but prediction intervals scale with the level.
Examples:
MNN (no trend, no seasonality): $$Y_t = \ell_{t-1}(1 + \varepsilon_t)$$ $$\ell_t = \ell_{t-1}(1 + \alpha \varepsilon_t)$$
MAN (additive trend): $$Y_t = (\ell_{t-1} + b_{t-1})(1 + \varepsilon_t)$$ $$\ell_t = (\ell_{t-1} + b_{t-1})(1 + \alpha \varepsilon_t)$$ $$b_t = b_{t-1} + \beta(\ell_{t-1} + b_{t-1}) \varepsilon_t$$
Admissible Parameter Space¶
For stability and forecastability, ETS models have admissible parameter regions:
ANN / MNN: $$0 < \alpha < 2$$
AAN / MAN: $$0 < \alpha < 2, \quad 0 < \beta < 4 - 2\alpha$$
ADN (damped additive trend): $$0 < \phi \leq 1, \quad 1 - \frac{1}{\phi} < \alpha < 1 + \frac{1}{\phi}$$ $$\alpha(\phi - 1) < \beta < (1 + \phi)(2 - \alpha)$$
In practice, $\alpha, \beta^*, \gamma$ are typically constrained to $(0,1)$ for conventional exponential smoothing behavior, though the admissible regions allow for broader ranges that still ensure stable forecasts.
Admissible regions do not depend on whether errors are additive or multiplicative.
Model Selection¶
ETS models are typically estimated by maximizing the likelihood function. For model selection, information criteria are used:
- AIC (Akaike Information Criterion): $\text{AIC} = -2\log L + 2k$
- AICc (Corrected AIC): $\text{AICc} = \text{AIC} + \frac{2k(k+1)}{n-k-1}$ (recommended for small samples)
- BIC (Bayesian Information Criterion): $\text{BIC} = -2\log L + k\log n$
where $k$ is the number of parameters and $n$ is the number of observations.
The log-likelihood depends on the error type:
Additive errors: $$\log L = -\frac{n}{2}\log\left(\frac{1}{n}\sum_{t=1}^{n}e_t^2\right)$$
Multiplicative errors: $$\log L = -\frac{n}{2}\log\left(\frac{1}{n}\sum_{t=1}^{n}e_t^2\right) - \sum_{t=1}^{n}\log|\hat{y}_t|$$
Ref: Hyndman, R.J., Koehler, A.B., Ord, J.K., Snyder, R.D. (2008) Forecasting with exponential smoothing: the state space approach, Springer-Verlag: New York. exponentialsmoothing.net
Libraries and data¶
# Libraries
# ==============================================================================
import sys
import pandas as pd
import matplotlib.pyplot as plt
from skforecast.stats import Ets
from skforecast.recursive import ForecasterStats
from skforecast.model_selection import TimeSeriesFold, backtesting_stats
from skforecast.datasets import fetch_dataset
from skforecast.plot import set_dark_theme, plot_prediction_intervals
from skforecast.utils import expand_index
# Data download
# ==============================================================================
data = fetch_dataset(name='fuel_consumption', raw=True)
data = data[['Fecha', 'Gasolinas']]
data = data.rename(columns={'Fecha':'date', 'Gasolinas':'litters'})
data['date'] = pd.to_datetime(data['date'], format='%Y-%m-%d')
data = data.set_index('date')
data = data.loc[:'1990-01-01 00:00:00']
data = data.asfreq('MS')
data = data['litters'].rename('y')
display(data.head(4))
╭──────────────────────────────── fuel_consumption ────────────────────────────────╮ │ Description: │ │ Monthly fuel consumption in Spain from 1969-01-01 to 2022-08-01. │ │ │ │ Source: │ │ Obtained from Corporación de Reservas Estratégicas de Productos Petrolíferos and │ │ Corporación de Derecho Público tutelada por el Ministerio para la Transición │ │ Ecológica y el Reto Demográfico. https://www.cores.es/es/estadisticas │ │ │ │ URL: │ │ https://raw.githubusercontent.com/skforecast/skforecast- │ │ datasets/main/data/consumos-combustibles-mensual.csv │ │ │ │ Shape: 644 rows x 6 columns │ ╰──────────────────────────────────────────────────────────────────────────────────╯
date 1969-01-01 166875.2129 1969-02-01 155466.8105 1969-03-01 184983.6699 1969-04-01 202319.8164 Freq: MS, Name: y, dtype: float64
# Train-test dates
# ==============================================================================
end_train = '1983-01-01 23:59:59'
print(
f"Train dates : {data.index.min()} --- {data.loc[:end_train].index.max()} "
f"(n={len(data.loc[:end_train])})"
)
print(
f"Test dates : {data.loc[end_train:].index.min()} --- {data.index.max()} "
f"(n={len(data.loc[end_train:])})"
)
data_train = data.loc[:end_train]
data_test = data.loc[end_train:]
# Plot
# ==============================================================================
set_dark_theme()
fig, ax=plt.subplots(figsize=(7, 3))
data_train.plot(ax=ax, label='train')
data_test.plot(ax=ax, label='test')
ax.set_title('Monthly fuel consumption in Spain')
ax.legend();
Train dates : 1969-01-01 00:00:00 --- 1983-01-01 00:00:00 (n=169) Test dates : 1983-02-01 00:00:00 --- 1990-01-01 00:00:00 (n=84)
ETS model¶
Skforecast provides the class Ets to facilitate the creation of ETS models in Python, allowing users to easily fit and forecast time series data using this approach.
The Ets class provides flexible control over model specification and parameter estimation through several key arguments:
Model Specification
model: Three-letter code specifying the model structure:- First letter (Error):
A(Additive),M(Multiplicative), orZ(Auto-select) - Second letter (Trend):
N(None),A(Additive),M(Multiplicative), orZ(Auto-select) - Third letter (Season):
N(None),A(Additive),M(Multiplicative), orZ(Auto-select) - Examples:
"ANN"(simple exponential smoothing),"AAN"(Holt's linear trend),"AAA"(additive Holt-Winters) - Use
"ZZZ"for fully automatic model selection
- First letter (Error):
m: Seasonal period (e.g., 12 for monthly data with yearly seasonality, 4 for quarterly)damped: Enable damped trend to prevent unbounded extrapolationTrue: Use damped trendFalse: Use standard (non-damped) trendNone: Try both whenmodel="ZZZ"(automatic selection)
Fixed Parameters (optional)
If specified, these parameters are held fixed instead of being estimated:
alpha: Level smoothing parameter (0 < α < 1)beta: Trend smoothing parameter (0 < β < α)gamma: Seasonal smoothing parameter (0 < γ < 1-α)phi: Damping parameter (0 < φ < 1)
Automatic Model Selection (model="ZZZ")
seasonal: Allow seasonal models in automatic selectiontrend: Control trend in model searchNone: Try both trending and non-trending modelsTrue: Only try models with trendFalse: Only try models without trend
ic: Information criterion for model selection"aic": Akaike Information Criterion"aicc": Corrected AIC (recommended for small samples)"bic": Bayesian Information Criterion
allow_multiplicative: Allow multiplicative errors and seasonalityallow_multiplicative_trend: Allow multiplicative trend (generally not recommended)
Transformations
lambda_param: Box-Cox transformation parameterNone: No transformation0: Log transformation1: No transformation- Other values: Box-Cox transformation
lambda_auto: Automatically select optimal Box-Cox parameterbias_adjust: Apply bias adjustment when back-transforming forecasts
Parameter Constraints
bounds: Type of parameter bounds"usual": Traditional bounds (0 < α, β*, γ < 1)"admissible": Broader stability bounds (e.g., 0 < α < 2 for ANN)"both": Check both usual and admissible bounds
Common Configuration Examples
| Use Case | Configuration | Description |
|---|---|---|
| Automatic (unrestricted) | Ets(m=12, model="ZZZ") |
Fully automatic selection from all valid models |
| Automatic (conservative) | Ets(m=12, model="ZZZ", allow_multiplicative=False) |
Only additive error and seasonality models |
| Simple exponential smoothing | Ets(model="ANN") |
No trend, no seasonality (flat forecast) |
| Holt's linear trend | Ets(model="AAN") |
Additive trend, no seasonality |
| Damped trend | Ets(model="AAN", damped=True) |
Dampened trend for conservative long-term forecasts |
| Additive Holt-Winters | Ets(model="AAA", m=12) |
Additive trend and seasonality |
| Multiplicative Holt-Winters | Ets(model="MAM", m=12) |
Multiplicative errors, additive trend, multiplicative seasonality |
| Seasonal no trend | Ets(model="ANA", m=12) |
Additive seasonality without trend |
| Non-seasonal with Box-Cox | Ets(model="AAN", lambda_auto=True) |
Automatic variance stabilization transformation |
| Fixed smoothing parameters | Ets(model="AAN", alpha=0.2, beta=0.1) |
Manual parameter specification (not estimated) |
# ETS with a specific model configuration
# ==============================================================================
model = Ets(m=12, model="AAA")
model.fit(y=data_train)
model.summary()
ETS Model Summary ============================================================ Model: Ets(AAA) Seasonal period (m): 12 Smoothing parameters: alpha (level): 0.1000 beta (trend): 0.0100 gamma (seasonal): 0.0100 Initial states: Level (l0): 197637.6942 Trend (b0): 2251.9201 Model fit statistics: sigma^2: 294326220.344355 Log-likelihood: -1639.36 AIC: 3312.72 BIC: 3365.93 Residual statistics: Mean: -1152.867362 Std Dev: 16331.260792 MAE: 12418.719387 RMSE: 16323.633666 Time Series Summary Statistics: Number of observations: 169 Mean: 384743.1773 Std Dev: 108126.6689 Min: 155466.8105 25%: 303667.7591 Median: 397278.0241 75%: 466194.3073 Max: 605073.0143
AutoETS¶
When model is set to "ZZZ" or None, the ETS class performs automatic model selection following this process:
Candidate Model Generation: Constructs a set of candidate models by combining different error types (A/M), trend types (N/A/M), and seasonal types (N/A/M), subject to constraints:
- By default, multiplicative trend is excluded (
allow_multiplicative_trend=False) as it can be numerically unstable - Certain combinations are invalid (e.g., additive error with multiplicative components)
- If data has fewer observations than the seasonal period
m, only non-seasonal models are considered - For high-frequency data (
m > 24), seasonality is automatically disabled
- By default, multiplicative trend is excluded (
Model Estimation: Each candidate model is fitted to the data using maximum likelihood estimation with parameter bounds checking (
bounds="both"by default, ensuring both usual and admissible constraints)Model Evaluation: Models are ranked using the specified information criterion:
- AICc (default): Corrected AIC, recommended for small samples: $\text{AICc} = \text{AIC} + \frac{2k(k+1)}{n-k-1}$
- AIC: Akaike Information Criterion: $\text{AIC} = -2\log L + 2k$
- BIC: Bayesian Information Criterion: $\text{BIC} = -2\log L + k\log n$
Best Model Selection: Returns the model with the lowest information criterion value. If the data shows evidence of trend (>10% change between first and second half), models without trend receive a penalty to prefer trending models.
The trend, seasonal, damped, and allow_multiplicative parameters control which candidate models are considered, allowing you to restrict the search space based on domain knowledge.
💡 Tip
The implementation of ETS in skforecast uses numba to optimize performance. Because the first call to fit() or predict() triggers code compilation, it may take longer than subsequent calls. After this initial compilation, performance improves significantly. While this overhead can be inconvenient during interactive sessions, it provides substantial benefits in production environments where thousands of models may be fitted and forecasted efficiently.
# AutoETS: automatic model selection
# ==============================================================================
model = Ets(m=12, model=None)
model.fit(y=data_train)
model.summary()
ETS Model Summary ============================================================ Model: Ets(MAM) Seasonal period (m): 12 Smoothing parameters: alpha (level): 0.1000 beta (trend): 0.0100 gamma (seasonal): 0.0100 Initial states: Level (l0): 191275.6996 Trend (b0): 2678.3040 Model fit statistics: sigma^2: 0.001604 Log-likelihood: -1614.33 AIC: 3262.66 BIC: 3315.87 Residual statistics: Mean: -1587.582913 Std Dev: 16096.366173 MAE: 11681.267343 RMSE: 16127.006194 Time Series Summary Statistics: Number of observations: 169 Mean: 384743.1773 Std Dev: 108126.6689 Min: 155466.8105 25%: 303667.7591 Median: 397278.0241 75%: 466194.3073 Max: 605073.0143
The automatic model selection identified ETS(MAM) as the optimal configuration for this fuel consumption time series. This model combines multiplicative errors, additive trend, and multiplicative seasonality with a 12-month period.
The multiplicative error component indicates that forecast uncertainty scales proportionally with the series level, which is appropriate for data where variability increases over time. The additive trend captures the linear growth pattern in fuel consumption (approximately 2,677 units per month based on the initial trend estimate). Most notably, the multiplicative seasonal component accounts for proportional seasonal variations—meaning that seasonal peaks and troughs grow larger as the overall consumption level increases. This is typical in economic and consumption data where seasonal effects compound with the underlying level.
The model's relatively low smoothing parameters (α=0.10, β=0.01, γ=0.01) suggest that recent observations have moderate influence on the forecasts, with the model relying heavily on the established patterns rather than reacting strongly to recent fluctuations.
Once the model is fitted, it can be used to forecast future observations. It is important to note that these types of models require predictions to follow immediately after the training data; therefore, the forecast starts right after the last observed value.
For performance reasons, predictions are returned as NumPy arrays. These can be easily converted into Pandas Series by mapping them to the corresponding time index.
# Predictions as pandas Series
# ==============================================================================
steps = len(data_test)
predictions = model.predict(steps=steps)
pred_index = expand_index(index=data_train.index, steps=steps)
predictions = pd.Series(predictions, index=pred_index)
predictions.head(4)
1983-02-01 408101.282195 1983-03-01 468889.543146 1983-04-01 490033.012053 1983-05-01 482048.495783 Freq: MS, dtype: float64
# Prediction interval
# ==============================================================================
predictions = model.predict_interval(steps=steps, level=[95])
predictions.index = pred_index
predictions.head(3)
| mean | lower_95 | upper_95 | |
|---|---|---|---|
| 1983-02-01 | 408101.282195 | 379112.149242 | 439283.248986 |
| 1983-03-01 | 468889.543146 | 430982.719071 | 505646.259513 |
| 1983-04-01 | 490033.012053 | 449928.041976 | 530495.775930 |
ForecasterStats¶
The previous section introduced the construction of ETS models. In order to seamlessly integrate these models with the various functionalities provided by skforecast, the next step is to encapsulate the skforecast Ets model within a ForecasterStats object. This encapsulation harmonizes the intricacies of the model and allows for the coherent use of skforecast's extensive capabilities.
Train¶
The train-prediction process follows an API similar to that of scikit-learn. More details in the ForecasterStats user guide.
# Create and fit ForecasterStats
# ==============================================================================
forecaster = ForecasterStats(estimator=Ets(m=12, model="MAM"))
forecaster.fit(y=data_train)
forecaster
ForecasterStats
General Information
- Estimators:
- skforecast.Ets: Ets(MAM)
- Window size: 1
- Series name: y
- Exogenous included: False
- Creation date: 2026-02-02 14:07:44
- Last fit date: 2026-02-02 14:07:45
- Skforecast version: 0.20.0
- Python version: 3.13.11
- Forecaster id: None
Exogenous Variables
-
None
Data Transformations
- Transformer for y: None
- Transformer for exog: None
Training Information
- Training range: [Timestamp('1969-01-01 00:00:00'), Timestamp('1983-01-01 00:00:00')]
- Training index type: DatetimeIndex
- Training index frequency: MS
Estimator Parameters
- skforecast.Ets: {'m': 12, 'model': 'MAM', 'damped': None, 'alpha': None, 'beta': None, 'gamma': None, 'phi': None, 'seasonal': True, 'trend': None, 'allow_multiplicative': True, 'allow_multiplicative_trend': False}
Fit Kwargs
-
None
Prediction¶
Once the model is fitted, it can be used to forecast future observations. It is important to note that these types of models require predictions to follow immediately after the training data; therefore, the forecast starts right after the last observed value.
# Predict
# ==============================================================================
steps = len(data_test)
predictions = forecaster.predict(steps=steps)
predictions.head(3)
1983-02-01 408101.282195 1983-03-01 468889.543146 1983-04-01 490033.012053 Freq: MS, Name: pred, dtype: float64
Prediction intervals¶
The method predict_interval enables the calculation of prediction intervals for the forecasted values. Users can specify the confidence level of the estimated interval using either the alpha or interval argument.
# Predict intervals
# ==============================================================================
predictions = forecaster.predict_interval(steps=steps, alpha=0.05)
predictions.head(3)
| pred | lower_bound | upper_bound | |
|---|---|---|---|
| 1983-02-01 | 408101.282195 | 374425.191092 | 439908.674532 |
| 1983-03-01 | 468889.543146 | 434420.905502 | 507391.851633 |
| 1983-04-01 | 490033.012053 | 451247.496524 | 527738.119888 |
# Plot predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
plot_prediction_intervals(
predictions = predictions,
y_true = data_test,
target_variable = "y",
title = "Prediction intervals in test data",
kwargs_fill_between = {'color': 'white', 'alpha': 0.3, 'zorder': 1},
ax = ax
)
Feature importances¶
The method get_feature_importances returns the estimated components of the ETS model, including level, trend, and seasonal effects. This provides insights into the contributions of each component to the overall forecast.
# Feature importances
# ==============================================================================
model.get_feature_importances()
| feature | importance | |
|---|---|---|
| 0 | alpha (level) | 0.10 |
| 1 | beta (trend) | 0.01 |
| 2 | gamma (seasonal) | 0.01 |
Backtesting¶
ETS and other statistical models, once integrated in a ForecasterStats object, can be evaluated using any of the backtesting strategies implemented in skforecast.
✏️ Note
Why do statistical models require refitting during backtesting?
Unlike machine learning models, statistical models like ARAR maintain an internal state that depends on the sequence of observations. They can only generate predictions starting from the last observed time step — they cannot "jump" to an arbitrary point in the future without knowing all previous values.
During backtesting, when the validation window moves forward, the model must be refitted to incorporate the new observations and update its internal state. This is why refit=True is typically required.
Performance optimization: Because refitting is mandatory, skforecast's Numba-optimized backend becomes essential. It enables hundreds of refits during backtesting in a fraction of the time required by non-optimized libraries.
# Create forecaster
# ==============================================================================
forecaster = ForecasterStats(estimator=Ets(m=12, model="MAM"))
# Backtesting
# ==============================================================================
cv = TimeSeriesFold(
steps = 12,
initial_train_size = len(data_train),
refit = True,
)
metrics, predictions = backtesting_stats(
forecaster = forecaster,
y = data,
cv = cv,
metric = ['mean_absolute_error', 'mean_absolute_percentage_error'],
suppress_warnings = True,
verbose = True,
)
Information of folds
--------------------
Number of observations used for initial training: 169
Number of observations used for backtesting: 84
Number of folds: 7
Number skipped folds: 0
Number of steps per fold: 12
Number of steps to exclude between last observed data (last window) and predictions (gap): 0
Fold: 0
Training: 1969-01-01 00:00:00 -- 1983-01-01 00:00:00 (n=169)
Validation: 1983-02-01 00:00:00 -- 1984-01-01 00:00:00 (n=12)
Fold: 1
Training: 1970-01-01 00:00:00 -- 1984-01-01 00:00:00 (n=169)
Validation: 1984-02-01 00:00:00 -- 1985-01-01 00:00:00 (n=12)
Fold: 2
Training: 1971-01-01 00:00:00 -- 1985-01-01 00:00:00 (n=169)
Validation: 1985-02-01 00:00:00 -- 1986-01-01 00:00:00 (n=12)
Fold: 3
Training: 1972-01-01 00:00:00 -- 1986-01-01 00:00:00 (n=169)
Validation: 1986-02-01 00:00:00 -- 1987-01-01 00:00:00 (n=12)
Fold: 4
Training: 1973-01-01 00:00:00 -- 1987-01-01 00:00:00 (n=169)
Validation: 1987-02-01 00:00:00 -- 1988-01-01 00:00:00 (n=12)
Fold: 5
Training: 1974-01-01 00:00:00 -- 1988-01-01 00:00:00 (n=169)
Validation: 1988-02-01 00:00:00 -- 1989-01-01 00:00:00 (n=12)
Fold: 6
Training: 1975-01-01 00:00:00 -- 1989-01-01 00:00:00 (n=169)
Validation: 1989-02-01 00:00:00 -- 1990-01-01 00:00:00 (n=12)
0%| | 0/7 [00:00<?, ?it/s]
# Backtest predictions
# ==============================================================================
predictions.head(4)
| fold | pred | |
|---|---|---|
| 1983-02-01 | 0 | 408101.282195 |
| 1983-03-01 | 0 | 468889.543146 |
| 1983-04-01 | 0 | 490033.012053 |
| 1983-05-01 | 0 | 482048.495783 |
# Backtest metrics
# ==============================================================================
metrics
| mean_absolute_error | mean_absolute_percentage_error | |
|---|---|---|
| 0 | 17010.270276 | 0.030752 |
# Plot backtest predictions
# ==============================================================================
fig, ax = plt.subplots(figsize=(6, 3))
data.loc[end_train:].plot(ax=ax, label='test')
predictions['pred'].plot(ax=ax)
ax.set_title('Backtest predictions (folds of 12 months)')
ax.legend();
In-sample Predictions¶
Predictions on the training data are crucial for evaluating the accuracy and effectiveness of the model. By comparing the predicted values with the actual observed values in the training dataset, you can assess how well the model has learned the underlying patterns and trends in the data. This comparison helps in understanding the model's performance and identify areas where it may need improvement or adjustment. In essence, they act as a mirror, reflecting how the model interprets and reconstructs the historical data on which it was trained.
Predictions of the observations used to fit the model are stored in the fitted_values_ attribute of the ETS object.
# In-sample Predictions
# ==============================================================================
forecaster = ForecasterStats(estimator=Ets(m=12, model="MAM"))
forecaster.fit(y=data_train)
# Show only the first 5 values
forecaster.estimators_[0].fitted_values_[:5]
array([167998.14060777, 162819.56599272, 188519.55126091, 199042.32235291,
198515.09912727])
Memory optimization¶
For production environments where you need to store many fitted models but only require forecasting capabilities (not diagnostics), you can significantly reduce memory usage with the reduce_memory() method. This is especially useful when working with large datasets or deploying models in resource-constrained environments.
This method removes in-sample fitted values and residuals, which are only needed for diagnostic purposes but not for generating forecasts.
# Compare size before and after reduce_memory()
# ==============================================================================
def total_model_size(model):
size = sys.getsizeof(model)
for attr_name in dir(model):
if attr_name.startswith('_'):
continue
try:
attr = getattr(model, attr_name)
size += sys.getsizeof(attr)
except Exception:
pass
return size
forecaster = ForecasterStats(
estimator=Ets(m=12, model="MAM"),
)
forecaster.fit(y=data_train)
model_size_before = total_model_size(forecaster.estimators_[0])
print(f"Memory before reduce_memory(): {model_size_before / 1024:.3f} KB")
# Reduce memory
forecaster.reduce_memory()
model_size_after = total_model_size(forecaster.estimators_[0])
print(f"Memory after reduce_memory(): {model_size_after / 1024:.3f} KB")
print(f"Memory reduction: {(1 - model_size_after / model_size_before) * 100:.1f}%")
Memory before reduce_memory(): 5.241 KB Memory after reduce_memory(): 2.319 KB Memory reduction: 55.7%
# Predictions still work after memory reduction
# ==============================================================================
forecaster.predict(steps=10)
1983-02-01 408101.282195 1983-03-01 468889.543146 1983-04-01 490033.012053 1983-05-01 482048.495783 1983-06-01 498540.830053 1983-07-01 606376.320233 1983-08-01 627071.555583 1983-09-01 513633.201604 1983-10-01 491399.448566 1983-11-01 437351.693039 Freq: MS, Name: pred, dtype: float64
Session information¶
import session_info
session_info.show(html=False)
----- matplotlib 3.10.8 pandas 2.3.3 session_info v1.0.1 skforecast 0.20.0 ----- IPython 9.8.0 jupyter_client 7.4.9 jupyter_core 5.9.1 notebook 6.5.7 ----- Python 3.13.11 | packaged by Anaconda, Inc. | (main, Dec 10 2025, 21:28:48) [GCC 14.3.0] Linux-6.14.0-37-generic-x86_64-with-glibc2.39 ----- Session information updated at 2026-02-02 14:07
Citation¶
How to cite this document
If you use this document or any part of it, please acknowledge the source, thank you!
Exponential Smoothing models in Python by Joaquín Amat Rodrigo, Javier Escobar Ortiz and Resul Akay available under Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0 DEED) at https://cienciadedatos.net/documentos/py76-exponential-smoothing-models.html
How to cite skforecast
If you use skforecast for a publication, we would appreciate if you cite the published software.
Zenodo:
Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.20.0). Zenodo. https://doi.org/10.5281/zenodo.8382787
APA:
Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.20.0) [Computer software]. https://doi.org/10.5281/zenodo.8382787
BibTeX:
@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.20.0}, month = {01}, year = {2026}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }
Did you like the article? Your support is important
Your contribution will help me to continue generating free educational content. Many thanks! 😊
This work by Joaquín Amat Rodrigo, Javier Escobar Ortiz and Resul Akay is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.
Allowed:
-
Share: copy and redistribute the material in any medium or format.
-
Adapt: remix, transform, and build upon the material.
Under the following terms:
-
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
-
NonCommercial: You may not use the material for commercial purposes.
-
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.
