If you like Skforecast , help us giving a star on GitHub! ⭐️
Global Forecasting Models involves the creation of a single forecasting model that considers all time series simultaneously. This approach attempts to capture the underlying patterns common to all series, thereby reducing the impact of noise present in individual series. It offers computational efficiency, ease of maintenance and robust generalisation across time series. Global forecasting assumes that time series with similar behaviour can benefit from being modelled together. When working with hundreds or thousands of series, initial clustering can help maximise model performance.
Clustering is an unsupervised analysis technique that groups a set of observations into clusters that contain observations that are considered homogeneous, while observations in different clusters are considered heterogeneous. Algorithms that cluster time series can be divided into two groups: those that use a transformation to create features prior to clustering (feature-driven time series clustering), and those that work directly on the time series (elastic distance measures).
Feature-driven time series clustering: Features describing structural characteristics are extracted from each time series, then these features are fed into arbitrary clustering algorithms. This features are obtained by applying statistical operations that best capture the underlying characteristics: trend, seasonality, periodicity, serial correlation, skewness, kurtosis, chaos, nonlinearity, and self-similarity.
Elastic distance measures: This approach works directly on the time series, adjusting or "realigning" the series in comparison to each other. The best known of this family of measures is Dynamic Time Warping (DTW).
If after performing the clustering analysis, distinct groups of series are identified, two strategies can be used:
Model all series together, while adding a feature that indicates the cluster to which each series belongs.
Build several global models, with each model tailored to a specific group of series.
To help understand the benefits of clustering, this document focuses on examining and comparing the results obtained when predicting the energy consumption of over a thousand buildings. Although the primary use of each building is available in the dataset, it may not reflect groups with similar patterns of energy use, so additional groups are created using clustering methods. A total of 4 experiments are performed:
Modelling all buildings together with a unique global model strategy.
Modelling groups of buildings based on their primary use (a global model per primary use).
Modelling groups of buildings based on time series feature clustering (a global model per cluster).
Modelling groups of buildings based on Dynamic Time Warping (DTW) clustering (a global model per cluster).
💡 Tip
To learn more about Global Forecasting Models, please refer to the following articles:Libraries used in this document.
# Data management
# ==============================================================================
import numpy as np
import pandas as pd
from datetime import datetime
from skforecast.datasets import fetch_dataset
# Plots
# ==============================================================================
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
plt.style.use('seaborn-v0_8-darkgrid')
# Forecasting
# ==============================================================================
import lightgbm
from lightgbm import LGBMRegressor
from sklearn.preprocessing import StandardScaler
import skforecast
from skforecast.recursive import ForecasterRecursiveMultiSeries
from skforecast.model_selection import TimeSeriesFold
from skforecast.model_selection import backtesting_forecaster_multiseries
# Feature extraction
# ==============================================================================
import tsfresh
from tsfresh import extract_features
from tsfresh.feature_extraction.settings import ComprehensiveFCParameters
from tsfresh.feature_extraction.settings import from_columns
# Clustering
# ==============================================================================
import sklearn
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
from sklearn.cluster import KMeans
import sktime
from sktime.clustering.k_means import TimeSeriesKMeans
# Warnings configuration
# ==============================================================================
import warnings
warnings.filterwarnings('once')
color = '\033[1m\033[38;5;208m'
print(f"{color}Version skforecast: {skforecast.__version__}")
print(f"{color}Version scikit-learn: {sklearn.__version__}")
print(f"{color}Version lightgbm: {lightgbm.__version__}")
print(f"{color}Version tsfresh: {tsfresh.__version__}")
print(f"{color}Version sktime: {sktime.__version__}")
print(f"{color}Version pandas: {pd.__version__}")
print(f"{color}Version numpy: {np.__version__}")
Data used in this document has been obtained from the Kaggle competition Addison Howard, Chris Balbach, Clayton Miller, Jeff Haberl, Krishnan Gowri, Sohier Dane. (2019). ASHRAE - Great Energy Predictor III. Kaggle.
Three files are used to create the modeling data set:
weather_train.csv
and weather_test.csv
: These files contain weather-related data for each building, including outdoor air temperature, dew temperature, relative humidity, and other weather parameters. The weather data is crucial for understanding the impact of external conditions on building energy usage.
building_metadata.csv
: This file provides metadata for each building in the dataset, such as building type, primary use, square footage, floor count, and year built. This information helps in understanding the characteristics of the buildings and their potential influence on energy consumption patterns.
train.csv
: the train dataset contains the target variable, i.e., the energy consumption data for each building, along with the timestamps for the energy consumption readings. It also includes the corresponding building and weather IDs to link the information across different datasets.
The three files have been preprocessed to remove buildings with less than 85% of values not equal to NaN or zero, to use only the electricity meter, and to aggregate the data to a daily frequency.
# Load preprocessed data
# ==============================================================================
data = fetch_dataset('ashrae_daily')
data.head()
# Fill missing values of air_temperature and wind_speed using fowrard and backward fill
# ==============================================================================
# Imputation must be done separately for each building
data = data.sort_values(by=['building_id', 'timestamp'])
data['air_temperature'] = data.groupby('building_id')['air_temperature'].ffill().bfill()
data['wind_speed'] = data.groupby('building_id')['wind_speed'].ffill().bfill()
data = data.sort_index()
print(
f"Rage of dates available : {data.index.min()} --- {data.index.max()} "
f"(n_days={(data.index.max() - data.index.min()).days})"
)
One of the key attributes associated with each building is its designated use. This feature may play a crucial role in influencing the energy consumption pattern, as distinct uses can significantly impact both the quantity and timing of energy consumption.
# Number of buildings and type of buildings based on primary use
# ==============================================================================
n_building = data['building_id'].nunique()
n_type_building = data['primary_use'].nunique()
range_datetime = [data.index.min(), data.index.max()]
print(f"Number of buildings: {n_building}")
print(f"Number of building types: {n_type_building}")
display(data.drop_duplicates(subset=['building_id'])['primary_use'].value_counts())
For certain primary use categories, there is a limited number of buildings within the dataset. To streamline the analysis, categories with fewer than 50 buildings are grouped into the "Other"
category.
# Types of buildings (primary use) with less than 50 samples are grouped as "Other".
# ==============================================================================
infrequent_categories = (
data
.drop_duplicates(subset=['building_id'])['primary_use']
.value_counts()
.loc[lambda x: x < 50]
.index
.tolist()
)
print("Infrequent categories:")
print("======================")
print('\n'.join(infrequent_categories))
data['primary_use'] = np.where(
data['primary_use'].isin(infrequent_categories),
'Other',
data['primary_use']
)
Next, a plot is created showing the energy consumption for a randomly selected building within each respective category, and a plot of all available time series for each category.
# Time series for 1 random selected building per group
# ==============================================================================
fig, axs = plt.subplots(nrows=3, ncols=2, figsize=(8, 5.5), sharex=True, sharey=False)
sample_ids = (
data
.groupby('primary_use')['building_id']
.apply(lambda x: x.sample(1, random_state=333))
.tolist()
)
axs = axs.flatten()
for i, building_id in enumerate(sample_ids):
data_sample = data[data['building_id'] == building_id]
building_type = data_sample['primary_use'].unique()[0]
data_sample.plot(
y = 'meter_reading',
ax = axs[i],
legend = False,
title = f"Building: {building_id}, type: {building_type}",
fontsize = 8
)
axs[i].set_xlabel("")
axs[i].set_ylabel("")
# Scientific notation for y axis
axs[i].ticklabel_format(axis='y', style='sci', scilimits=(0, 0))
axs[i].title.set_size(9)
fig.suptitle('Energy consumption for 6 random buildings', fontsize=12)
fig.tight_layout()
plt.show()
# Energy consumption by type of building (one gray line per building)
# ==============================================================================
fig, axs = plt.subplots(nrows=3, ncols=2, figsize=(8, 5.5), sharex=True, sharey=False)
axs = axs.flatten()
for i, building_type in enumerate(data['primary_use'].unique()):
data_sample = data[data['primary_use'] == building_type]
data_sample = data_sample.pivot_table(
index = 'timestamp',
columns = 'building_id',
values = 'meter_reading',
aggfunc = 'mean'
)
data_sample.plot(
legend = False,
title = f"Type: {building_type}",
color = 'gray',
alpha = 0.2,
fontsize = 8,
ax = axs[i]
)
data_sample.mean(axis=1).plot(
ax = axs[i],
color = 'blue',
fontsize = 8
)
axs[i].set_xlabel("")
axs[i].set_ylabel("")
# Scientific notation for y axis
axs[i].ticklabel_format(axis='y', style='sci', scilimits=(0,0))
axs[i].title.set_size(9)
# Limit the axis to 5 times the maximum mean value to improve visualization
axs[i].set_ylim(
bottom = 0,
top = 5 * data_sample.mean(axis=1).max()
)
fig.tight_layout()
fig.suptitle('Energy consumption by type of building', fontsize=12)
plt.subplots_adjust(top=0.9)
plt.show()
The graph reveals that there is significant variability in consumption patterns among buildings of the same purpose. This suggests that there may be scope for improving the criteria by which buildings are grouped.
The idea behind modeling multiple series at the same time is to be able to capture the main patterns that govern the series, thereby reducing the impact of the potential noise that each series may have. This means that series that behave similarly may benefit from being modeled together. One way to identify potential groups of series is to perform a clustering study prior to modeling. If clear groups are identified as a result of clustering, it is appropriate to model each of them separately.
Clustering is an unsupervised analysis technique that groups a set of observations into clusters that contain observations that are considered homogeneous, while observations in different clusters are considered heterogeneous. Algorithms that cluster time series can be divided into two groups: those that use a transformation to create features prior to clustering (feature-driven time series clustering), and those that work directly on the time series (elastic distance measures).
Feature-driven time series clustering: Features describing structural characteristics are extracted from each time series, then these features are fed into arbitrary clustering algorithms. This features are obtained by applying statistical operations that best capture the underlying characteristics: trend, seasonality, periodicity, serial correlation, skewness, kurtosis, chaos, nonlinearity, and self-similarity.
Elastic distance measures: This approach works directly on the time series, adjusting or "realigning" the series in comparison to each other. The best known of this family of measures is Dynamic Time Warping (DTW).
For a more detailed review of time series clustering, see A review and evaluation of elastic distance functions for time series clustering.
tsfresh is a powerful Python library for feature engineering from time-series and sequential data, including statistical measures, Fourier coefficients, and various other time-domain and frequency-domain features. It provides a systematic approach to automate the calculation of features and select the most informative ones.
To start, the default configuration of tsfresh is used, which calculates all available features. The easiest way to access this configuration is to use the functions provided by the ComprehensiveFCParameters
class. It creates a dictionary which is a mapping from string (the name of each feature) to a list of dictionary parameters to be used when the function with that name is called.
# Default features and settings created by tsfresh
# ==============================================================================
default_features = ComprehensiveFCParameters()
print("Name of the features extracted by tsfresh:")
print("=========================================")
print('\n'.join(list(default_features)))
Many of these features are calculated using different values for their arguments.
# Default configuration for feature "partial_autocorrelation"
# ==============================================================================
default_features['partial_autocorrelation']
To access the detailed view of each feature and the parameter values included in the default configuration, use the following code:
# Default configuration for all extracted feature
# ==============================================================================
# from pprint import pprint
# pprint(default_features)
Once the configuration of features has been defined, the next step is to extract them from the time series. The function extract_features()
of tsfresh is used for this purpose. This function receives as input the time series and the configuration of the features to be extracted. The output is a dataframe with the extracted features.
# Feature extraction
# ==============================================================================
ts_features = extract_features(
timeseries_container = data[['building_id', 'meter_reading']].reset_index(),
column_id = "building_id",
column_sort = "timestamp",
column_value = "meter_reading",
default_fc_parameters = default_features,
impute_function = tsfresh.utilities.dataframe_functions.impute,
n_jobs = 4
)
print("Shape of ts_features:", ts_features.shape)
ts_features.head(3)
As a result of the extraction process, 783 features have been created for each time series (building_id
in this use case). The returned dataframe has as its index the column specified in the column_id
argument of extract_features
.
The default extraction of tsfresh produces a huge number of features. However, only a few of these may be of interest in each use case. To select the most relevant ones, tsfresh includes an automated selection process based on hypothesis tests (FeatuRE Extraction based on Scalable Hypothesis tests). In this process, the features are individually and independently evaluated for their significance in predicting the target under investigation.
⚠ Warning
The selection process used by tsfresh is based on the significance of each feature in accurately predicting the target variable. To perform this process, a target variable is required – for example, the type of building associated with a given time series. However, there are cases where a target variable is not readily available. In such cases, alternative strategies can be used:In this case, since the goal is to identify groups of buildings with similar energy consumption patterns, regardless of the building type, the automated selection process is not used.
Some created features may have missing values for some of the time series. Since most clustering algorithms do not allow for missing values, these are excluded.
# Remove features with missing values
# ==============================================================================
ts_features = ts_features.dropna(axis=1, how='any')
print(f"Number of features after selection and removing missing values: {ts_features.shape[1]}")
Once the final feature matrix is ready, it may be useful to create a new dictionary to store the selected features and the parameters used to calculate them. This can be easily done using the from_columns
function.
# Dictionary with selected features and their configuration
# ==============================================================================
ts_features_info = from_columns(ts_features)
# pprint(ts_features_info['meter_reading'])
A K-means clustering method is used to group the buildings. Since clustering is known to be negatively affected by high dimensionality, and since several hundred features have been created for each building, a PCA is used to reduce the dimensionality of the data before the k-means is applied.
# Scale features so all of them have mean 0 and std 1
# ==============================================================================
scaler = StandardScaler().set_output(transform="pandas")
ts_features_scaled = scaler.fit_transform(ts_features)
ts_features_scaled.head(2)
# Plot variance reduction as a function of the number of PCA components
# ==============================================================================
pca = PCA()
pca.fit(ts_features_scaled)
n_components = np.argmax(np.cumsum(pca.explained_variance_ratio_) > 0.85) + 1
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(6, 2.5))
ax.plot(np.cumsum(pca.explained_variance_ratio_))
ax.set_title('Variance explained by PCA components')
ax.set_xlabel('Number of components')
ax.set_ylabel('Cumulative explained variance')
ax.axhline(y=0.85, color='red', linestyle='--')
ax.axvline(x=n_components, color='red', linestyle='--')
ax.text(
x=n_components + 10,
y=0.4,
s=f"n_components = {n_components}",
color='red',
verticalalignment='center'
)
plt.show();
The dimensionality of the original 783 features is reduced by using the first 68 principal components (explaining 85% of the variance).
# PCA with as many components as necessary to explain 85% of the variance
# ==============================================================================
pca = PCA(n_components=0.85)
pca_projections = pca.fit_transform(ts_features_scaled)
# Create a data frame with the projections. Each column is a principal component
# and each row is the id of the building.
pca_projections = pd.DataFrame(
pca_projections,
index = ts_features_scaled.index,
columns = [f"PC{i}" for i in range(1, pca.n_components_ + 1)]
)
print(f"Number of components selected: {pca.n_components_}")
print(f"Explained variance: {pca.explained_variance_ratio_.sum():.3f}")
pca_projections.head(3)
One of the inherent challenges of clustering is determining the optimal number of clusters, since there is no ground truth or predefined number of clusters in unsupervised learning. Several heuristics have been proposed to guide this selection, and in this case, the elbow method is used.
The elbow method involves plotting the total within-cluster sum of squares (WSS) against the number of clusters (k). The optimal number of clusters is typically identified at the "elbow" of the curve, where the rate of decrease in WSS begins to decrease significantly. This suggests that adding more clusters beyond this point provides little improvement in clustering performance.
# Optimal number of clusters
# ==============================================================================
# Identify the optimal number of clusters using the elbow method
range_n_clusters = range(1, 30)
inertias = []
size_of_clusters = []
for n_clusters in range_n_clusters:
kmeans = KMeans(
n_clusters = n_clusters,
n_init = 20,
random_state = 963852
)
kmeans.fit(pca_projections)
inertias.append(kmeans.inertia_)
size_of_clusters.append(np.unique(kmeans.labels_, return_counts=True)[1].tolist())
inertias = pd.Series(inertias, index=range_n_clusters)
dicrease_in_inertia = inertias.pct_change() * -100
fig, axs = plt.subplots(1, 2, figsize=(9, 2.5))
inertias.plot(marker='o', ax=axs[0])
axs[0].xaxis.set_major_locator(ticker.MaxNLocator(integer=True))
axs[0].set_title("Cluster inertia vs number of clusters")
axs[0].set_xlabel('Number of clusters')
axs[0].set_ylabel('Intra-cluster variance (inertia)')
dicrease_in_inertia.plot(kind='bar', ax=axs[1])
axs[1].set_title("Decrease in inertia")
axs[1].set_xlabel('Number of clusters')
axs[1].set_ylabel('Decrease in inertia (%)')
fig.tight_layout()
plt.show();
The chart shows that after 7 clusters the decrease in inertia slows down, therefore 7 clusters may be a good choice.
In addition to analyzing the evolution of the inertia (intra-variance), it is important to check the size of the clusters being created. The presence of small clusters may indicate overfitting or anomalous samples that do not fit well into any of the groups.
# Distribution of cluster sizes
# ==============================================================================
for n_clusters, sizes in zip(range_n_clusters, size_of_clusters):
print(f"Size of clusters (n = {n_clusters}): {sizes}")
When 7 clusters are created, the size of the clusters is not well balanced. Before modeling the time series, the smaller clusters (less than 20 observations) are combined into a single cluster named "other".
# Train clustering model with 6 clusters and assign each building to a cluster
# ==============================================================================
kmeans = KMeans(n_clusters=7, n_init=20, random_state=963852)
kmeans.fit(pca_projections)
clusters = kmeans.predict(pca_projections)
clusters = pd.DataFrame({
'building_id': pca_projections.index,
'cluster_base_on_features': clusters.astype(str)
})
# Merge cluster minor clusters into a single cluster
threshold = 20
cluster_size = clusters['cluster_base_on_features'].value_counts()
cluster_size = cluster_size[cluster_size < threshold].index.tolist()
clusters['cluster_base_on_features'] = np.where(
clusters['cluster_base_on_features'].isin(cluster_size),
'Other',
clusters['cluster_base_on_features']
)
display(clusters.head(3))
clusters['cluster_base_on_features'].value_counts().to_frame(name='Number of series')
# Add the predicted cluster to the data frame with the building information
# ==============================================================================
data = pd.merge(
data.reset_index(), # To avoid losing the index
clusters,
on ='building_id',
how ='left',
validate = 'm:1'
).set_index('timestamp')
data.head(3)
Once each building has been assigned to a cluster, it is useful to examine the characteristics of the grouped buildings. For example, the distribution of the primary use of the buildings.
# Percentaje of building types per cluster
# ==============================================================================
primary_usage_per_cluster = (
data
.groupby('cluster_base_on_features')['primary_use']
.value_counts(normalize=True)
.unstack()
.fillna(0)
)
primary_usage_per_cluster = (100 * primary_usage_per_cluster).round(1)
primary_usage_per_cluster
The results suggest (the table should be read horizontally) that the clustering process based on features extracted from the time series generates groups that differ from those formed by the main purpose of the building.
Dynamic Time Warping (DTW) is a technique that measures the similarity between two temporal sequences, which may vary in speed. In essence, it's an elastic distance measure that allows the time series to be stretched or compressed to align with each other optimally. Clustering with DTW involves grouping time series data based on their dynamic time-warped distances, ensuring that time series within the same cluster have similar shapes and patterns, even if they are out of phase in time or have different lengths.
The class TimeSeriesKMeans
class from the Sktime library enables the application of K-means clustering with a variety of distance metrics, including DTW, Euclidean, ERP, EDR, LCSS, squared, DDTW, WDTW, and WDDTW. Many of these metrics are elastic distances, making the method well-suited for time series data.
Sktime requires time series to be structured in a long format with a multi-index. The outermost index level represents the series ID, while the innermost level corresponds to the datetime.
# Convert data to long format with a multiindex: (building_id, timestamp)
# ==============================================================================
data_long = (
data
.reset_index()
.loc[:, ['timestamp', 'building_id', 'meter_reading']]
.set_index(['building_id', 'timestamp'])
.sort_index(ascending=True)
)
data_long
⚠ Warning
The next cell has a run time of approximately 10 minutes.# Fit clustering model
# ==============================================================================
model = TimeSeriesKMeans(
n_clusters = 4,
metric = "dtw",
max_iter = 10,
random_state = 123456
)
model.fit(data_long)
# Cluster prediction
# ==============================================================================
clusters = model.predict(data_long)
clusters = pd.DataFrame({
'building_id': data_long.index.get_level_values('building_id').drop_duplicates(),
'cluster_base_on_dtw': clusters.astype(str),
})
# Size of each cluster
clusters['cluster_base_on_dtw'].value_counts().sort_index().to_frame(name='Number of series')
# Add the predicted cluster to the data frame with the building information
# ==============================================================================
data = pd.merge(
data.reset_index(), # To avoid losing the index
clusters,
on = 'building_id',
how = 'left',
validate = 'm:1'
).set_index('timestamp')
data.head(3)
✎ Note
Sktime provides additional clustering algorithms, such as TimeSeriesKMeansTslearn, TimeSeriesKMedoids, and TimeSeriesKShapes that worth exploring. Other great libraries for clustering time series are: + DTAIDistance + Tslearn + AEON# Save data for modelling to avoid repeating the previous steps
# ==============================================================================
data.to_parquet('data_modelling.parquet')
After establishing the three grouping criteria (building usage, clustering based on time series features, and clustering based on dynamic time warping), multi-series global models are trained. The evaluation that follows focus on determining how effectively these models can forecast daily data for the final two weeks. During this assessment, three distinct metrics are used:
The average value of the Mean Absolute Error (MAE) for all buildings.
The absolute sum of the errors, i.e. the absolute deviation between the predicted value and the actual consumption for all buildings.
The bias for the predictions summed over all buildings.
In addition to the lagged values of the own time series, the day of the week (sine-cosine encoded), the outdoor temperature and the wind speed are included as exogenous variables.
✎ Note
For a more detailed explanation of time series model validation, readers are encouraged to consult the Backtesting user guide. For more information about calendar features and cyclical encoding visit Calendar features and Cyclical features in time series.To train the models, search for optimal hyperparameters, and evaluate their predictive performance, the data is divided into three separate sets: training, validation, and test.
# Load data for modelling
# ==============================================================================
data = pd.read_parquet('data_modelling.parquet')
# Split data in train, validation and test
# ==============================================================================
end_train = '2016-07-31 23:59:00'
end_validation = '2016-09-30 23:59:00'
data_train = data.loc[: end_train, :].copy()
data_val = data.loc[end_train:end_validation, :].copy()
data_test = data.loc[end_validation:, :].copy()
print(
f"Rage of dates available : {data.index.min()} --- {data.index.max()} "
f"(n_days={(data.index.max() - data.index.min()).days})"
)
print(
f" Dates for training : {data_train.index.min()} --- {data_train.index.max()} "
f"(n_days={(data_train.index.max() - data_train.index.min()).days})"
)
print(
f" Dates for validation : {data_val.index.min()} --- {data_val.index.max()} "
f"(n_days={(data_val.index.max() - data_val.index.min()).days})"
)
print(
f" Dates for test : {data_test.index.min()} --- {data_test.index.max()} "
f"(n_days={(data_test.index.max() - data_test.index.min()).days})"
)
# Table of results for all models
# ==============================================================================
table_results = pd.DataFrame(columns=['model', 'mae', 'abs_error', 'bias', 'elapsed_time'])
table_results = table_results.set_index('model')
table_results = table_results.astype({'mae': float, 'abs_error': float, 'bias': float, 'elapsed_time': object})
# Function to add calendar features to a DataFrame
# ==============================================================================
def day_of_week_cyclical_encoding(data):
"""
Calculate cyclical encoding for the day of week using a
DataFrame with a datetime index.
"""
data['day_of_week'] = data.index.dayofweek
data['sin_day_of_week'] = np.sin(2*np.pi*data['day_of_week']/7)
data['cos_day_of_week'] = np.cos(2*np.pi*data['day_of_week']/7)
return data
# Exogenous variables included in the model
# ==============================================================================
exog_features = ['sin_day_of_week', 'cos_day_of_week', 'air_temperature', 'wind_speed']
A global model for all buildings is trained and tested using the skforecast class ForecasterRecursiveMultiSeries
. This forecaster allows the user to pass the time series in several formats, including a dataFrame with the time series organized as columns.
For more information about using series of different lengths, or different exogenous variables for each series, see Global Forecasting Models.
# Reshape data to have one column per building
# ==============================================================================
data_pivot = data.pivot_table(
index = 'timestamp',
columns = 'building_id',
values = 'meter_reading',
aggfunc = 'mean'
)
data_pivot.columns = data_pivot.columns.astype(str)
data_pivot = data_pivot.asfreq('D').sort_index()
# Add calendar features
data_pivot = day_of_week_cyclical_encoding(data_pivot)
# Add temperature and wind speed as exogenous variables
data_pivot = data_pivot.merge(
data[['air_temperature', 'wind_speed']].resample('D').mean(),
left_index = True,
right_index = True,
how = 'left',
validate = '1:m'
)
data_pivot.head(3)
# Forecaster multi-series to model all buildings at the same time
# ==============================================================================
exog_features = ['sin_day_of_week', 'cos_day_of_week', 'air_temperature', 'wind_speed']
params_lgbm = {
'n_estimators': 500,
'learning_rate': 0.01,
'max_depth': 10,
'random_state': 8520,
'verbose': -1
}
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(**params_lgbm),
lags = 31,
encoding = 'ordinal_category'
)
start = datetime.now()
cv = TimeSeriesFold(
steps = 7,
initial_train_size = len(data_pivot.loc[:end_validation, :]),
refit = False
)
metric, predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = data_pivot.filter(regex='^id_'), # Select only buildings
exog = data_pivot[exog_features],
cv = cv,
metric = 'mean_absolute_error',
add_aggregated_metric = False,
verbose = False,
show_progress = True
)
end = datetime.now()
mean_metric_all_buildings = metric['mean_absolute_error'].mean()
errors_all_buildings = predictions - data_pivot.loc[predictions.index, predictions.columns]
sum_abs_errors_all_buildings = errors_all_buildings.abs().sum().sum()
sum_bias_all_buildings = errors_all_buildings.sum().sum()
table_results.loc['Global model', 'mae'] = mean_metric_all_buildings
table_results.loc['Global model', 'abs_error'] = sum_abs_errors_all_buildings
table_results.loc['Global model', 'bias'] = sum_bias_all_buildings
table_results.loc['Global model', 'elapsed_time'] = end - start
print("")
print(f"Average mean absolute error for all buildings: {mean_metric_all_buildings:.0f}")
print(f"Sum of absolute errors for all buildings (x 10,000): {sum_abs_errors_all_buildings/10000:.0f}")
print(f"Bias (x 10,000): {sum_bias_all_buildings/10000:.0f}")
# Plot predictions vs real value for 2 random buildings
# ==============================================================================
rng = np.random.default_rng(147)
selected_buildings = rng.choice(data['building_id'].unique(), size=2, replace=False)
fig, axs = plt.subplots(2, 1, figsize=(6, 4), sharex=True)
axs = axs.flatten()
for i, building in enumerate(selected_buildings):
data_test.query("building_id == @building")['meter_reading'].plot(ax=axs[i], label='test')
predictions[building].plot(ax=axs[i], label='One model per building')
axs[i].set_title(f"Building {building}", fontsize=10)
axs[i].set_xlabel("")
axs[i].legend()
fig.tight_layout()
plt.show();
# Forecaster multi-series models for buildings grouped by primary usage
# ==============================================================================
predictions_all_buildings = []
metrics_all_buildings = []
params_lgbm = {
'n_estimators': 500,
'learning_rate': 0.01,
'max_depth': 10,
'random_state': 8520,
'verbose': -1
}
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(**params_lgbm),
lags = 31,
encoding = "ordinal_category"
)
start = datetime.now()
for primary_usage in data['primary_use'].unique():
print(
f"Training and testing model for primary usage: {primary_usage} "
f"(n = {data[data['primary_use'] == primary_usage]['building_id'].nunique()})"
)
# Create a subset based on primary use clusters
data_subset = data[data['primary_use'] == primary_usage]
data_subset = data_subset.pivot_table(
index = 'timestamp',
columns = 'building_id',
values = 'meter_reading',
aggfunc = 'mean'
)
data_subset.columns = data_subset.columns.astype(str)
data_subset = data_subset.asfreq('D').sort_index()
# Add calendar features
data_subset = day_of_week_cyclical_encoding(data_subset)
data_subset = data_subset.merge(
data[['air_temperature', 'wind_speed']].resample('D').mean(),
left_index = True,
right_index = True,
how = 'left',
validate = '1:m'
)
cv = TimeSeriesFold(
steps = 7,
initial_train_size = len(data_subset.loc[:end_validation, :]),
refit = False
)
metric, predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = data_subset.filter(regex='^id_'),
exog = data_subset[['sin_day_of_week', 'cos_day_of_week']],
cv = cv,
metric = 'mean_absolute_error',
add_aggregated_metric = False,
verbose = False,
show_progress = False
)
predictions_all_buildings.append(predictions)
metrics_all_buildings.append(metric)
end = datetime.now()
predictions_all_buildings = pd.concat(predictions_all_buildings, axis=1)
metrics_all_buildings = pd.concat(metrics_all_buildings, axis=0)
errors_all_buildings = (
predictions_all_buildings - data_pivot.loc[predictions_all_buildings.index, predictions_all_buildings.columns]
)
mean_metric_all_buildings = metrics_all_buildings['mean_absolute_error'].mean()
sum_abs_errors_all_buildings = errors_all_buildings.abs().sum().sum()
sum_bias_all_buildings = errors_all_buildings.sum().sum()
table_results.loc['Global model per primary usage', 'mae'] = mean_metric_all_buildings
table_results.loc['Global model per primary usage', 'abs_error'] = sum_abs_errors_all_buildings
table_results.loc['Global model per primary usage', 'bias'] = sum_bias_all_buildings
table_results.loc['Global model per primary usage', 'elapsed_time'] = end - start
print("")
print(f"Average mean absolute error for all buildings: {mean_metric_all_buildings:.0f}")
print(f"Sum of absolute errors for all buildings (x 10,000): {sum_abs_errors_all_buildings/10000:.0f}")
print(f"Bias (x 10,000): {sum_bias_all_buildings/10000:.0f}")
# Add predictions to the already existing plot (not showing the plot yet)
# ==============================================================================
for i, building in enumerate(selected_buildings):
predictions_all_buildings[building].plot(ax=axs[i], label='Global model per primary usage')
axs[i].legend()
A global model for each cluster based on time series features is trained and tested.
# Forecaster multi-series models for buildings grouped by time series features
# ==============================================================================
predictions_all_buildings = []
metrics_all_buildings = []
params_lgbm = {
'n_estimators': 500,
'learning_rate': 0.01,
'max_depth': 10,
'random_state': 8520,
'verbose': -1
}
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(**params_lgbm),
lags = 31,
encoding = "ordinal_category"
)
start = datetime.now()
for cluster in data['cluster_base_on_features'].unique():
print(
f"Training and testing model for cluster: {cluster} "
f"(n = {data[data['cluster_base_on_features'] == cluster]['building_id'].nunique()})"
)
# Create subset based on features clusters
data_subset = data[data['cluster_base_on_features'] == cluster]
data_subset = data_subset.pivot_table(
index = 'timestamp',
columns = 'building_id',
values = 'meter_reading',
aggfunc = 'mean'
)
data_subset.columns = data_subset.columns.astype(str)
data_subset = data_subset.asfreq('D').sort_index()
# Add calendar features
data_subset = day_of_week_cyclical_encoding(data_subset)
data_subset = data_subset.merge(
data[['air_temperature', 'wind_speed']].resample('D').mean(),
left_index = True,
right_index = True,
how = 'left',
validate = '1:m'
)
cv = TimeSeriesFold(
steps = 7,
initial_train_size = len(data_subset.loc[:end_validation, :]),
refit = False
)
metric, predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = data_subset.filter(regex='^id_'),
exog = data_subset[['sin_day_of_week', 'cos_day_of_week']],
cv = cv,
metric = 'mean_absolute_error',
add_aggregated_metric = False,
verbose = False,
show_progress = False
)
predictions_all_buildings.append(predictions)
metrics_all_buildings.append(metric)
end = datetime.now()
predictions_all_buildings = pd.concat(predictions_all_buildings, axis=1)
metrics_all_buildings = pd.concat(metrics_all_buildings, axis=0)
errors_all_buildings = (
predictions_all_buildings - data_pivot.loc[predictions_all_buildings.index, predictions_all_buildings.columns]
)
mean_metric_all_buildings = metrics_all_buildings['mean_absolute_error'].mean()
sum_abs_errors_all_buildings = errors_all_buildings.abs().sum().sum()
sum_bias_all_buildings = errors_all_buildings.sum().sum()
table_results.loc['Global model per cluster (features)', 'mae'] = mean_metric_all_buildings
table_results.loc['Global model per cluster (features)', 'abs_error'] = sum_abs_errors_all_buildings
table_results.loc['Global model per cluster (features)', 'bias'] = sum_bias_all_buildings
table_results.loc['Global model per cluster (features)', 'elapsed_time'] = end - start
print("")
print(f"Average mean absolute error for all buildings: {mean_metric_all_buildings:.0f}")
print(f"Sum of absolute errors for all buildings (x 10,000): {sum_abs_errors_all_buildings/10000:.0f}")
print(f"Bias (x 10,000): {sum_bias_all_buildings/10000:.0f}")
# Add predictions to the already existing plot (not showing the plot yet)
# ==============================================================================
for i, building in enumerate(selected_buildings):
predictions_all_buildings[building].plot(ax=axs[i], label='Global model per cluster (features)')
axs[i].legend()
A global model for each cluster based on elastic distance (DTW) is trained and tested.
# Forecaster multi-series models for buildings grouped by DTW
# ==============================================================================
predictions_all_buildings = []
metrics_all_buildings = []
params_lgbm = {
'n_estimators': 500,
'learning_rate': 0.01,
'max_depth': 10,
'random_state': 8520,
'verbose': -1
}
forecaster = ForecasterRecursiveMultiSeries(
regressor = LGBMRegressor(**params_lgbm),
lags = 31,
encoding = 'ordinal_category'
)
start = datetime.now()
for cluster in data['cluster_base_on_dtw'].unique():
print(
f"Training and testing model for cluster: {cluster} "
f"(n = {data[data['cluster_base_on_dtw'] == cluster]['building_id'].nunique()})"
)
# Create subset based on DTW clusters
data_subset = data[data['cluster_base_on_dtw'] == cluster]
data_subset = data_subset.pivot_table(
index = 'timestamp',
columns = 'building_id',
values = 'meter_reading',
aggfunc = 'mean'
)
data_subset.columns = data_subset.columns.astype(str)
data_subset = data_subset.asfreq('D').sort_index()
# Add calendar features
data_subset = day_of_week_cyclical_encoding(data_subset)
data_subset = data_subset.merge(
data[['air_temperature', 'wind_speed']].resample('D').mean(),
left_index = True,
right_index = True,
how = 'left',
validate = '1:m'
)
cv = TimeSeriesFold(
steps = 7,
initial_train_size = len(data_subset.loc[:end_validation, :]),
refit = False
)
metric, predictions = backtesting_forecaster_multiseries(
forecaster = forecaster,
series = data_subset.filter(regex='^id_'),
exog = data_subset[['sin_day_of_week', 'cos_day_of_week']],
cv = cv,
metric = 'mean_absolute_error',
add_aggregated_metric = False,
verbose = False,
show_progress = False
)
predictions_all_buildings.append(predictions)
metrics_all_buildings.append(metric)
end = datetime.now()
predictions_all_buildings = pd.concat(predictions_all_buildings, axis=1)
metrics_all_buildings = pd.concat(metrics_all_buildings, axis=0)
errors_all_buildings = (
predictions_all_buildings - data_pivot.loc[predictions_all_buildings.index, predictions_all_buildings.columns]
)
mean_metric_all_buildings = metrics_all_buildings['mean_absolute_error'].mean()
sum_abs_errors_all_buildings = errors_all_buildings.abs().sum().sum()
sum_bias_all_buildings = errors_all_buildings.sum().sum()
table_results.loc['Global model per cluster (DTW)', 'mae'] = mean_metric_all_buildings
table_results.loc['Global model per cluster (DTW)', 'abs_error'] = sum_abs_errors_all_buildings
table_results.loc['Global model per cluster (DTW)', 'bias'] = sum_bias_all_buildings
table_results.loc['Global model per cluster (DTW)', 'elapsed_time'] = end - start
print("")
print(f"Average mean absolute error for all buildings: {mean_metric_all_buildings:.0f}")
print(f"Sum of absolute errors for all buildings (x 10,000): {sum_abs_errors_all_buildings/10000:.0f}")
print(f"Bias (x 10,000): {sum_bias_all_buildings/10000:.0f}")
# Add predictions to the already existing plot (not showing the plot yet)
# ==============================================================================
for i, building in enumerate(selected_buildings):
predictions_all_buildings[building].plot(ax=axs[i], label='Global model per cluster (DTW)')
axs[i].legend()
# Table of results
# ==============================================================================
table_results['elapsed_time'] = table_results['elapsed_time'].astype(str).str[:7]
table_results.style.highlight_min(axis=0, color='green').format(precision=0)
# Plot predictions vs real value for 2 random buildings for all models
# ==============================================================================
handles, labels = axs[0].get_legend_handles_labels()
fig.legend(handles, labels, loc='upper center', bbox_to_anchor=(0.5, 0.05), ncol=2)
for ax in axs:
ax.legend().remove()
fig
This document has shown that clustering time series can be a valuable tool for improving forecasting models. By grouping time series with similar patterns, the impact of noise can be reduced, leading to more accurate predictions. The results of this study demonstrate that the choice of clustering algorithm can significantly impact the performance of the forecasting model. In this case, the dynamic time warping (DTW) algorithm outperformed the feature-driven clustering method and the primary use of the building as a grouping criterion.
Further research
This analysis has provided interesting insights into the effectiveness combining clustering and forecasting models. Possible next steps could include:
Manually review buildings with high error. Identify if there is a group for which the model is not performing well.
Add more exogenous features: see skforecast's user guide Calendar Features for calendar and sunlight features that typically affect energy consumption.
Optimize lags and hyperparameters: Use Grid, Random or Bayesian search to find the best model configuration.
Try other machine learning algorithms.
import session_info
session_info.show(html=False)
How to cite this document
If you use this document or any part of it, please acknowledge the source, thank you!
Clustering Time Series to Improve Forecasting Models by Joaquín Amat Rodrigo and Javier Escobar Ortiz, available under a CC BY-NC-SA 4.0 at https://www.cienciadedatos.net/documentos/py64-clustering-time-series-forecasting.html
How to cite skforecast
If you use skforecast for a publication, we would appreciate it if you cite the published software.
Zenodo:
Amat Rodrigo, Joaquin, & Escobar Ortiz, Javier. (2024). skforecast (v0.14.0). Zenodo. https://doi.org/10.5281/zenodo.8382788
APA:
Amat Rodrigo, J., & Escobar Ortiz, J. (2024). skforecast (Version 0.14.0) [Computer software]. https://doi.org/10.5281/zenodo.8382788
BibTeX:
@software{skforecast, author = {Amat Rodrigo, Joaquin and Escobar Ortiz, Javier}, title = {skforecast}, version = {0.14.0}, month = {11}, year = {2024}, license = {BSD-3-Clause}, url = {https://skforecast.org/}, doi = {10.5281/zenodo.8382788} }
This work by Joaquín Amat Rodrigo and Javier Escobar Ortiz is licensed under a Attribution-NonCommercial-ShareAlike 4.0 International.
Allowed:
Share: copy and redistribute the material in any medium or format.
Adapt: remix, transform, and build upon the material.
Under the following terms:
Attribution: You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.
NonCommercial: You may not use the material for commercial purposes.
ShareAlike: If you remix, transform, or build upon the material, you must distribute your contributions under the same license as the original.