Portfolio Construction Using Topological Data Analysis

Introduction

This research diversifies SPY by clustering its top 200 weighted constituents using topological data analysis (TDA). The strategy aims to minimize correlation risk by employing KepplerMapper for projection, Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) for dimensionality reduction, and Density-Based Spatial Clustering of Applications with Noise (DBSCAN) for clustering. The resulting clusters are then used to construct a portfolio with equal weighting across giant and small clusters and within small clusters. This method is expected to enhance portfolio diversification, reduce its correlation with SPY, and reduce the drawdown.

Background

Topological Data Analysis (TDA) is a field of data analysis that uses topological techniques to understand the shape and structure of data. TDA can reveal hidden patterns and relationships in high-dimensional data, allowing us to cluster securities with less obvious correlations in non-linear hyperspace.

The Mapper Algorithm is a tool used in TDA to project high-dimensional data into a lower-dimensional space while preserving the combinatorial representations of the topological structure. It clusters the data more computationally efficiently in the topological space.

Principal Component Analysis (PCA) is a dimension reduction method that transforms data into a new set of linear combinations of the original variables and drops the new variables with low variance. It can preserve most of the information in the dataset while removing noise and dimension quickly to facilitate the upcoming processes' speed and accuracy.

Uniform Manifold Approximation and Projection (UMAP) is a non-linear dimensionality reduction technique that further projects high-dimensional data in a lower-dimensional space. It was chosen for its ability to preserve local and global data structures, which is essential for accurate clustering while being computationally efficient.

Density-Based Spatial Clustering of Applications with Noise (DBSCAN) is a clustering algorithm that groups closely packed projected points, making it robust to outliers. DBSCAN with correlation distance is applied to identify clusters based on the similarity of stock returns so that we can evenly dissipate the non-systematic risk.

The highly correlated projected points by the Mapper Algorithm would create small clusters, which can be represented as nodes in a graph. Correlated small clusters would connect by edges in the graph, and each connected structure would be considered a large cluster. The overall graph is called a simplicial complex, which represents the underlying topological structure of the data.

AD_4nXcRc6lQVzSODZ5k_1u8KaWN5vPWwzVRwcTgEXj89u-CX91bx5ToC7GnijZtp2K0PhvYuDuWAcMvXpIIK_C3ejnaWuISq7KAT83Qdt6E1RB19kZzPDWUMsqqhQl4tWvu4nVu3Pr0?key=WIXkN0A1qXi7IKPI-1Lq3ART

Since the clusters and subclusters are low correlated, we can split the capital equally between clusters and subclusters and within subclusters to dissipate the capital risk evenly.

Implementation

To implement this strategy, we start by selecting SPY's top 200 weighted constituents in the initialize method with a TopologicalGraphUniverseSelectionModel class.

universe_model = TopologicalGraphUniverseSelectionModel(
            "SPY",
            history_lookback,
            recalibrate_period,
            lambda u: [x.symbol for x in sorted(
                [x for x in u if x.weight],
                key=lambda x: x.weight,
                reverse=True
            )[:200]]
        )

In the TopologicalGraphUniverseSelectionModel class, we use a KepplerMapper to project the log returns into a lower-dimensional space using PCA and UMAP. We then apply DBSCAN to cluster the projected data using correlation distance.

prices = algorithm.history(self.universe.selected, lookback_window, Resolution.DAILY).unstack(0).close
log_returns = np.log(prices / prices.shift(1)).dropna().T
mapper = km.KeplerMapper()
projected_data = mapper.fit_transform(log_returns, projection=[PCA(n_components=0.8, random_state=1), UMAP(n_components=1, random_state=1, n_jobs=-1)])
graph = mapper.map(projected_data, log_returns, clusterer=DBSCAN(metric='correlation', n_jobs=-1))

The resulting clusters are analyzed to identify giant clusters (connected simplicial complexes) and small clusters (grouped nodes). We then construct the portfolio by assigning equal weights to each giant cluster, each small cluster, and stocks within small clusters in the weight_distribution method of the EqualClustersWeightingPortfolioConstructionModel class.

def weight_distribution(self, clustered_symbols):
    weights = {}
    def assign_weights(nested_list, level=1):
        num_elements = len(nested_list)
        weight_per_element = 1 / num_elements
        for item in nested_list:
            if isinstance(item, list):
                assign_weights(item, level + 1)
            else:
                weights[item] = weights.get(item, 0) + weight_per_element / (2 ** (level - 1))
    assign_weights(clustered_symbols)
    return pd.Series(weights) / sum(weights.values())

Results

The strategy was backtested from March 2020 to March 2025 using the LEAN engine. The benchmark was buy-and-hold SPY and a normalized top 200 weighted SPY constituents strategy. The strategy yielded the following performance metrics over the backtest period.

	TDA Portfolio (Proposed)	Normalized Top 200 Weighted SPY Constituents	Buy-and-Hold SPY
Beta	0.803	0.926	1 (Reference)
Annualize Variance	0.016	0.018	0.021
Maximum Drawdown	17.4%	24.5%	26.3%

We ran a parameter optimization job to test the sensitivity of the chosen parameters. We tested a historical data lookback window of 50 weeks to 500 weeks in steps of 50 weeks and a simplicial complex reconstruction period of daily(0), weekly(1), monthly(2), or yearly(3). Of the 40 parameter combinations, 40/40 (100%) produced a lower Beta than the benchmark, and 32/40 (80.0%) made a smaller maximum drawdown than the benchmark.

AD_4nXcfjFJ5CNEPl5uYhbKw7rfrFOTrfZxYEvDV6sgDQEBwRJajmkIwFA1gfHMpyZP-4iOM3adZafY8C2k-k74mb9LrlvBKQsfj7SB_kf0hGIYTMCNLp4nIcB5LIW1_q_RGQ34mu8zjIg?key=WIXkN0A1qXi7IKPI-1Lq3ART

The red circle in the preceding image identifies the parameters we chose as the strategy's default. We chose a historical data lookback window of 200 weeks and a yearly simplicial complex reconstruction because they produced the lowest drawdown and correlation with the benchmark.

We also yielded a 0.012 Alpha, demonstrating the potential of using TDA and clustering techniques to enhance risk-adjusted return while lowering the correlation with the benchmark.

References

Carlsson, G. (2009). Topology and data. Bulletin of the American Mathematical Society, 46(2), 255-308.
McInnes, L., Healy, J., & Melville, J. (2018). UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction. arXiv preprint arXiv:1802.03426.
Ester, M., Kriegel, H. P., Sander, J., & Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. KDD, 96, 226-231.

Hi Stephane,

To activate it, you need to hit CTRL+Space and then autocomplete will pop up for the current code you are writing. If this doesn't work for you, try emptying the browser cache and doing a hard refresh. This should enable autocomplete for you, and if not please let us know!

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

Jack Simonson INVESTOR

March 2019

Upvote

Flame INVESTOR

Flame + Bear | March 2019

Hi Jack

Do you know how this works on mac? I can't seem to get it to work.

Thanks

It should work the same on a Mac. Try a hard refresh, emptying the cache in your browser, and making sure the LEAN engine running in the cloud is the most recent version. If this still doesn't work, let us know and we'll try to pinpoint the issue!

Bjorn INVESTOR

April 2019

Anyone got this to work on Mac?
I cleared the browser-cache but "CTRL + Space" is not triggering an In-Browser Python Autocomplete (Firefox 64.0.2 (64-Bit) + Chome Version 73.0.3683.103).
Am i missing something?

Flame + Bear | April 2019

I haven’t been able to get it to work

Link Liang INVESTOR

Hi Mac users,

Autocomplete in python is now available on MacOS! Please try it out by CTRL+Space and give us more feedback!

Barry LAM INVESTOR

May 2019

Autocomplete in Python available now??

I tried clean my browser cache and the autocomplete does not work.

Browser : chrome (Version 74.0.3729.157 (Official Build) (64-bit))

OS :Window 7 pro

Hi Barry,

Could you try to type "self." in a python algorithm and then hit CTRL+SPACE? There should be a list of methods pop up. If it still does not work for you, please let us know and we will be happy to assist you.

Hi Link,

I found out the reason. As my company have more than one input language, when i press "CTRL+SPACE", the function is to change the keyboard input language. So, the autocomplete does not work.

For the people have this problem, you can change the "hot key for input language' in "Advanced key setting"

Thanks for your help.

Sergei Laptenok INVESTOR

September 2019

I'm using Mac, Chrome Version 76.0.3809.132 (Official Build) (64-bit). Autocomplete is painfully inconsistent. Sometimes it works, sometimes it doesn't, couldn't find any apparent pattern yet. Also, 1) when it does, it doesn't show all possible choices, 2) API hint looks to be C#

Looks like I found the pattern: autocompete works only when the algorithm is built. So the steps are:

1) type "self."

2) wait a couple of seconds until it builds

3) hit CTRL+space

4) wait a second

5) see the list of options

Very inconvenient as is, but also the list is incomplete, as I said. E.g. I want to type self.UniverseSettings.Resolution, this is what I get (see screenshot):

1 person upvoted this

Martin Molinero INVESTOR

QuantConnect | September 2019

Hi Sergei Laptenok,

Once the build process is triggered, autocomplete is started in parallel. The autocomplete suggestions of $self.$ do not include $self.UniverseSettings$ because we are limiting the number of suggestions for performance reasons. Triggering autocomplete on $self.U$ will present $UniverseSettings$ as an option.

Step $2) wait a couple of seconds until it builds$ should not be required because the Terminal will not build until you hit CTRL+space or the build button.

We've just pushed some backend improvements with the objective of making autocomplete suggestions available faster.

Thank you, Martin!

I might be missing something, but whenever I type anything, the Build button greys out and changes to "Building...". While it does, autocomplete doesn't respond.

Jared Broad INVESTOR

Please send a ticket through support with your algorithm. We can try and
debug if its something specific to your strategy

I'm using a free account for now, trying to figure out if the platfrm is suitable for me, so ticket support is not an option. I did create a new algrithm (default template) and the behaviour is the same: build triggers every time I type anything new.

Sherry Yang INVESTOR

Hi Sergei,

Thanks for trying out QuantConnect!

You can definitely keep sharing your questions without purchasing a paid tier. Keep sending your thoughts to the forum and our Quant team and community members can support. You are correct that build triggers every time new code is written. Can you share a bit more about what you’d like to see in the terminal in a new thread?

Thanks!
Sherry

Derek Melchin INVESTOR

QuantConnect | December 2020

Hi Aadarsh,

Autocomplete should work for all of the objects in our API. Follow these instructions to get it setup.

Best,
Derek Melchin

Alberto Esteban INVESTOR

January 2021

Hello,

Autocomplete in python is not working. I tried all of the above:

tried different browsers: Edge, Chrome and Brave.
tried a different laptop, all three browsers
tried clear cache and cookies. still not working.

Im just getting started, and if autocomplete doesnt work, I woun't be able to get much done. At all.

2 people upvoted this

Dirk Steyn INVESTOR

Hi autocomplete also not working for me (on Crome)

follow all the instruction above.

Please help,

Dirk

Pierre Guilet INVESTOR

February 2021

Hi,

I have the same problem with python. CTRL+space launch the build but no popup. It does not work in firefox and edge without any adblock. And nothing is attributed tro ctrl+space. And option is activated. I have set up autocomplete in my local environment but it would be nice to have it in browser too.

Regards

Platform

Portfolio Construction Using Topological Data Analysis

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research Publications

VOTE FOR UPCOMING FEATURES

JOIN OUR Research MAILING LIST

Introduction

Background

Implementation

Results

References

IN THIS RESEARCH

PARTICIPANTS

Actions

Join QuantConnect for Free

Platform

SIGN IN

Portfolio Construction Using Topological Data Analysis

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research Publications

VOTE FOR UPCOMING FEATURES

JOIN OUR Research MAILING LIST

Introduction

Background

Implementation

Results

References

IN THIS RESEARCH

PARTICIPANTS

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free