Introduction
In this tutorial, we will take a close look at a principal component analysis (PCA)-based statistical arbitrage strategy derived from the paper Statistical Arbitrage in the U.S. Equities Market.
Statistical arbitrage strategies uses mean-reversion models to take advantage of pricing inefficiencies between groups of correlated securities. This class of short-term financial trading strategies produce moves that can be contrarian to the broader market movement and are often discussed in conjunction with Pairs Trading. In our algorithm, we will be using a PCA-based approach as opposed to an ETF-based approach to limit our universe of stocks. Backtests from the period 1997-2007 support our strategy by showing that PCA-based strategies have Sharpe ratios that outperform Sharpe ratios from ETF-based strategies.
Method
Step 1: Select our universe
We will select our universe of stocks by dropping securities with prices lower than $5 and pick the ones with the highest dollar traded volume.
# Sort the equities in DollarVolume decendingly
selected = sorted([x for x in coarse if x.price > 5],
key=lambda x: x.dollar_volume, reverse=True)
symbols = [x.symbol for x in selected[:self.num_equities]]
Step 2: Reduce dimensions to three principal components
We want to minimize our algorithm's exposure to market factors. PCA is a procedure that extracts uncorrelated components of a possibly-correlated set of observations to reveal the factors that contribute most to a the variance of the observations as a whole. Applying PCA to the data above enables us to reduce dimensionality and select the most relevant market factors to shape our asset universe. Based on the results found in the cited paper, and for the sake of demonstration, we chose 3 components to account for the bulk of the variance. In our algorithm, the 3 principal components of the feature space are formed by the historical close values.
# Sample data for PCA (smooth it using np.log function)
sample = np.log(history.dropna(axis=1))
sample -= sample.mean() # Center it column-wise
# Fit the PCA model for sample data
model = PCA().fit(sample)
# Get the first n_components factors
factors = np.dot(sample, model.components_.T)[:,:self.num_components]
Step 3: Measure price deviation
We will model the mean-reverting residuals of our assets from a regression line. We use linear regression to derive the weight of each stock in the portfolio based on its price deviation, which is measured by the residual. If the absolute value of a stock's residual is large, it means that the level of price deviation is high and we should give it more weight in the portfolio. Similarly, if the absolute value of the residual is small, it is reasonable to give the stock less weight in the portfolio. To facilitate this, we can first standardize the residuals to get their z-scores. Then, based on the z-scores, it is easy to detect the level of price deviation. Specifically, the level of deviation is higher when the absolute values of the z-scores are large. From this it is natural to use the inverse of the absolute values of the z-scores as a measurement of the weights of the portfolio.
# Train Ordinary Least Squares linear model for each stock
OLSmodels = {ticker: sm.OLS(sample[ticker], factors).fit() for ticker in sample.columns}
# Get the residuals from the linear regression after PCA for each stock
resids = pd.DataFrame({ticker: model.resid for ticker, model in OLSmodels.items()})
# Get the Z scores by standarize the given pandas dataframe X
zscores = ((resids - resids.mean()) / resids.std()).iloc[-1] # residuals of the most recent day
# Get the stocks far from mean (for mean reversion)
selected = zscores[zscores < -1.5]
# Return the weights for each selected stock
weights = selected * (1 / selected.abs().sum())
Results
In our alorithm, the portfolio is rebalanced every 30 days and the backtest period runs from Jan 2010 to Aug 2019. Our result is an annual rate of return over 6% with a max drawdown of around 49% for nearly 10 years. Our performance indicates using PCA combined with linear regression to measure the deviation level is reasonable.
To tune the model, we could expand our universe of stocks beyond the current 20 Equities or incorporate more PCA components. We could also come up with another way to measure the level of deviation or change the rebalance frequency of the algorithm (30 days in this example).
Calvin Sowah
Great algorithm! I'm a little confused by what is being traded here. Is this trading only GE and SPY?
Eric Kao
The cloned algorithm doesn't work as intended. It only trades GE and SPY.
Derek Melchin
See the attached backtest for an updated version of the algorithm in PEP8 style.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Jack Simonson
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!