In This Page

Introduction

Import Libraries

Get Historical Data

Prepare Data

Train Models

Test Models

Store Models

Reference

Examples

Popular Libraries

Tslearn

Introduction

This page explains how to build, train, test, and store tslearn models.

Import Libraries

Import the tslearn libraries.

from tslearn.barycenters import softdtw_barycenter
from tslearn.clustering import TimeSeriesKMeans

Get Historical Data

Get some historical market data to train and test the model. For example, get data for the securities shown in the following table:

Group Name	Tickers
Overall US market	SPY, QQQ, DIA
Tech companies	AAPL, MSFT, TSLA
Long-term US Treasury ETFs	IEF, TLT
Short-term US Treasury ETFs	SHV, SHY
Heavy metal ETFs	GLD, IAU, SLV
Energy sector	USO, XLE, XOM

qb = QuantBook()
tickers = ["SPY", "QQQ", "DIA", 
           "AAPL", "MSFT", "TSLA", 
           "IEF", "TLT", "SHV", "SHY", 
           "GLD", "IAU", "SLV", 
           "USO", "XLE", "XOM"]
symbols = [qb.add_equity(ticker, Resolution.DAILY).symbol for ticker in tickers]
history = qb.history(symbols, datetime(2020, 1, 1), datetime(2022, 2, 20))

Prepare Data

You need some historical data to prepare the data for the model. If you have historical data, manipulate it to train and test the model. In this example, standardize the log close price time-series of the securities. Follow these steps to prepare the data:

Unstack the historical DataFrame and select the close column.

close = history.unstack(0).close

Take the logarithm of the historical time series.

log_close = np.log(close)

Taking the logarithm eases the compounding effect.

Standardize the data.

standard_close = (log_close - log_close.mean()) / log_close.std()

Train Models

Instead of using real-time comparison, we could apply a technique call Dynamic Time Wrapping (DTW) with Barycenter Averaging (DBA). Intuitively, it is a technique of averaging a few time-series into a single one without losing much of their information. Since not all time-series would move efficiently like in ideal EMH assumption, this would allow similarity analysis of different time-series with sticky lags. Check the technical details from tslearn documentation page.

Dynamic time wraping barycenter averaging visualization

We then can separate different clusters by KMean after DBA.

# Set up the Time Series KMean model with soft DBA.
km = TimeSeriesKMeans(n_clusters=6,   # We have 6 main groups
                      metric="softdtw",  # soft for differentiable
                      random_state=0)

# Fit the model.
km.fit(standard_close.T)

Test Models

We visualize the clusters and their corresponding underlying series.

Predict with the label of the data.

labels = km.predict(standard_close.T)

Create a class to aid plotting.

def plot_helper(ts):
    # plot all points of the data set
    for i in range(ts.shape[0]):
        plt.plot(ts[i, :], "k-", alpha=.2)
        
    # plot the given barycenter of them
    barycenter = softdtw_barycenter(ts, gamma=1.)
    plt.plot(barycenter, "r-", linewidth=2)

Plot the results.

j = 1
plt.figure(figsize=(15, 10))
for i in set(labels):
    # Select the series in the i-th cluster.
    X = standard_close.iloc[:, [n for n, k in enumerate(labels) if k == i]].values
    
    # Plot the series and barycenter-averaged series.
    plt.subplot(len(set(labels)) // 3 + (1 if len(set(labels))%3 != 0 else 0), 3, j)
    plt.title(f"Cluster {i+1}")
    plot_helper(X.T)
    
    j += 1

plt.show()

Tslearn model equity curves of each cluster

Display the groupings.

for i in set(labels):
    print(f"Cluster {i+1}: {standard_close.columns[[n for n, k in enumerate(labels) if k == i]]}")

Store Models

You can save and load tslearn models using the Object Store.

Save Models

Follow these steps to save models in the Object Store:

Set the key name of the model to be stored in the Object Store.

model_key = "model"

Call the GetFilePathget_file_path method with the key.

file_name = qb.object_store.get_file_path(model_key)

This method returns the file path where the model will be stored.

Delete the current file to avoid a FileExistsError error when you save the model.

import os
os.remove(file_name)

Call the to_hdf5 method with the file path.

km.to_hdf5(file_name + ".hdf5")

Load Models

You must save a model into the Object Store before you can load it from the Object Store. If you saved a model, follow these steps to load it:

Call the ContainsKeycontains_key method.

qb.object_store.contains_key(model_key)

This method returns a boolean that represents if the model_key is in the Object Store. If the Object Store does not contain the model_key, save the model using the model_key before you proceed.

Call the GetFilePathget_file_path method with the key.

file_name = qb.object_store.get_file_path(model_key)

This method returns the path where the model is stored.

Call the from_hdf5 method with the file path.

loaded_model = TimeSeriesKMeans.from_hdf5(file_name + ".hdf5")

This method returns the saved model.

Reference

F. Petitjean, A. Ketterlin, P. Gancarski. (2010). A global averaging method for dynamic time warping, with applications to clustering. Pattern Recognition. 44(2011). 678-693. Retreived from https://lig-membres.imag.fr/bisson/cours/M2INFO-AIW-ML/papers/PetitJean11.pdf

Examples

The following examples demonstrate some common practices for using the tslearn library.

Example 1: DBA Clustering

The following research notebook uses tslearn machine learning model to cluster a collection of stocks applying Dynamic Time Wrapping (DTW) with Barycenter Averaging (DBA).

# Import the tslearn library.
from tslearn.barycenters import softdtw_barycenter
from tslearn.clustering import TimeSeriesKMeans

# Instantiate the QuantBook for researching.
qb = QuantBook()
# Request the daily history of the collection of stocks in the date range to be studied.
tickers = ["SPY", "QQQ", "DIA", 
           "AAPL", "MSFT", "TSLA", 
           "IEF", "TLT", "SHV", "SHY", 
           "GLD", "IAU", "SLV", 
           "USO", "XLE", "XOM"]
symbols = [qb.add_equity(ticker, Resolution.DAILY).symbol for ticker in tickers]
history = qb.history(symbols, datetime(2020, 1, 1), datetime(2022, 2, 20))

# Obtain the daily log close price to be analyzed.
close = history.unstack(0).close
log_close = np.log(close)       # Taking the logarithm eases the compounding effect.
# Standardize the data for faster convergence.
standard_close = (log_close - log_close.mean()) / log_close.std()

# Set up the Time Series KMean model with soft DBA.
km = TimeSeriesKMeans(n_clusters=6,   # We have 6 main groups
                      metric="softdtw",  # soft for differentiable
                      random_state=0)
# Fit the model.
km.fit(standard_close.T)

# Call the predict method with the testing dataset to get the prediction from the model.
labels = km.predict(standard_close.T)

# Create a class to aid plotting.
def plot_helper(ts):
    # plot all points of the data set
    for i in range(ts.shape[0]):
        plt.plot(ts[i, :], "k-", alpha=.2)
        
    # plot the given barycenter of them
    barycenter = softdtw_barycenter(ts, gamma=1.)
    plt.plot(barycenter, "r-", linewidth=2)
# Plot the results.
j = 1
plt.figure(figsize=(15, 10))
for i in set(labels):
    # Select the series in the i-th cluster.
    X = standard_close.iloc[:, [n for n, k in enumerate(labels) if k == i]].values
    
    # Plot the series and barycenter-averaged series.
    plt.subplot(len(set(labels)) // 3 + (1 if len(set(labels))%3 != 0 else 0), 3, j)
    plt.title(f"Cluster {i+1}")
    plot_helper(X.T)
    
    j += 1

plt.show()
# Display the groupings.
for i in set(labels):
    print(f"Cluster {i+1}: {standard_close.columns[[n for n, k in enumerate(labels) if k == i]]}")

# Store the model in the object store to allow accessing the model in the next research session or in the algorithm for trading.
model_key = "model"
file_name = qb.object_store.get_file_path(model_key)
import os
os.remove(file_name)
km.to_hdf5(file_name + ".hdf5")

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation:

Browse

Cloud Platform

Writing Algorithms

Research Environment

▶
Key Concepts

Initialization

▶
Datasets

▶
Charting

Universes

▶
Indicators

Object Store

▶
Machine Learning

Debugging

▶
Meta Analysis

▶
Applying Research

Local Platform

LEAN CLI

LEAN Engine

Hello

Popular Libraries

Tslearn

Introduction

Import Libraries

Get Historical Data

Prepare Data

Train Models

Test Models

Store Models

Save Models

Load Models

Reference

Examples

Example 1: DBA Clustering

SIGN IN

Browse

Cloud Platform

Writing Algorithms

Research Environment

▶Key Concepts

Initialization

▶Datasets

▶Charting

Universes

▶Indicators

Object Store

▶Machine Learning

Debugging

▶Meta Analysis

▶Applying Research

Local Platform

LEAN CLI

LEAN Engine

Hello

Popular Libraries

Tslearn

Introduction

Import Libraries

Get Historical Data

Prepare Data

Train Models

Test Models

Store Models

Save Models

Load Models

Reference

Examples

Example 1: DBA Clustering

▶
Key Concepts

▶
Datasets

▶
Charting

▶
Indicators

▶
Machine Learning

▶
Meta Analysis

▶
Applying Research