Abstract
In this research, We investigate two pairs trading methods and compare the result. Pairs trading involves in investigating the dependence structure between two highly correlated assets. With the assumption that mean reversion will occur, long or short positions are entered in the opposite direction when there is a price divergence. Typically the asset price distribution is modeled by a Gaussian distribution of return series but the joint normal distribution may fail to catch some key features of the dependence of stock pairs' price like tail dependence. We investigate using copula theory to identify these trading opportunities.
We will discuss the basic framework of copula from the mathematical perspective and explain how to apply the approach in pairs trading. The implementation of the algorithm is based on the paper Trading strategies with copulas from Stander Y, Marais D, Botha I(2013). We compare the performance of the copula pairs trading strategy with the co-integration pairs trading method based on the paper Statistical arbitrage trading strategies and high-frequency trading from Hanson T A, Hall J R. (2012). The co-integration technique assumes a co-integration relationship between paired equities to identify profitable trading opportunities. The empirical results suggest that the copula-based strategy is more profitable than the traditional pairs trading techniques.
Framework of Copula
1. Definition
Given a random vector \(X_1,X_2,...,X_p\), its marginal cumulative distribution functions (CDFs) are \(F_i(x) = P[X_i \leq x]\). By applying the probability integral transform to each component, the marginal distributions of \((U_1,U_2,...,U_p) = (F_1(X_1),F_2(X_2),...,F_p(X_p))\) are uniform (from Wikipedia). Then the copula of \(X_1,X_2,...,X_p\) is defined as the joint cumulative distribution function of \(U_1,U_2,...,U_p\), for which the marginal distribution of each variable \(U\) is uniform as \(U(0,1)\).
\[C(u_1,u_2,...,u_p) = P[U_1\leq u_1,U_2\leq u_2,..., U_1\leq u_1]\]
Copulas function contains all the dependency characteristics of the marginal distributions and will better describe the linear and non-linear relationship between variables, using probability. They allow the marginal distributions to be modeled independently from each other, and no assumption on the joint behavior of the marginals is required.
2. Bivariate Copulas
Since this research focuses on bivariate copulas (for pairs trading we have 2 random variables) some probabilistic properties are specified. Let X and Y be two random variables with cumulative probability function \(F_1(X)\) and \(F_2(Y)\). \(U=F_1(X), V=F_2(Y)\) which are uniformly distributed. Then the copula function is \(C(u,v)=P(U\leq u,V\leq v)\). Taking the partial derivative of the copula function over U and V would give the conditional distribution function as follows:
\[P(U\leq u\mid V= v)=\frac{\partial C(u,v)}{\partial v}\] \[P(V\leq v\mid U= u)=\frac{\partial C(u,v)}{\partial u}\]
3. Archimedean Copulas
There are many copula functions that enable us to describe dependence structures between variables, other than the Gaussian assumption. Here we will focus three of these; the Clayton, Gumbel and Frank copula formulas from the Archimedean class. Archimedean copulas are based on the Laplace transforms φ of univariate distribution functions. They are constructed by a particular generator function \(\phi\).
\[C(u,v)=\phi^{-1}( \phi(u),\phi(v) )\]
The probability density function is:
\[c(u,v)=\phi_{(2)}^{-1}(\phi(u)+\phi(v))\phi^{'}(u)\phi^{'}(v)\]
Where \(\phi_{(2)}^{-1}\) is the inverse of the second derivative of the generator function.
Copula | Copula function C(u,v;θ) |
---|---|
Clayton Copula | \[(u^{-\theta}+v^{-\theta}-1)^{-1/\theta}\] |
Gumbel Copula | \[exp(-[(-\ln u)^{\theta}+(-\ln v)^{\theta}]^{1/\theta})\] |
Frank Copula | \[-\theta^{-1}\ln\left[1+\frac{(exp(-\theta u)-1)(exp(-\theta v)-1)}{exp(-\theta)-1}\right]\] |
Genest and MacKay proved that the relation between the copula generator function and Kendall rank correlation tau in the bivariate case can be given by:
\[\tau=1+4\int_{0}^{1} \frac{\partial \phi (v)}{\partial \phi^{'}(v)}dv\]
So we can easily estimate the parameter in Archimedean copulas if we know Kendall’s tau rank measure and the generator function. Please refer to step 3 to see the formulas.
Part I - Copula Method
ETFs have many different stock sectors and asset classes which provide us a wide range of pairs trading candidates. Our data set consists of daily data of the ETFs traded on the NASDAQ or the NYSE.
We use the first 3 years of data to choose the best fitting copula and asset pair ("training formation period"). Next, we use a period of more than 9 years from January 2010 to September 2019 ("the trading period"), to execute the strategy. During the trading period we use a rolling 12 month window of data to get the copula parameters ("rolling formation period").
Step 1: Selecting the Paired Stocks
The general method of pair selection is based on both fundamental and statistical analysis.
1) Assemble a list of potentially related pairs
Any random pairs could be correlated. It is possible that those variables are not causally related to each other, but because of a spurious relationship due to either coincidence or the presence of a certain third, unseen factor. Thus, it is important for us to start with a list of securities that have something in common. For this demonstration, we choose some of the most liquid ETFs traded on the Nasdaq or the NYSE. The relationship for those potentially related pairs could be due to an Index, sector or asset class overlap. e.g. QQQ and XLK are two ETFs which track the market leading Indices.
2) Filter the trading pair with statistical correlation
To determine which stock pairs to include in the analysis, correlations between the pre-selected ETF pairs are analyzed. Below are three types of correlation measures we usually use in statistics:
Correlation Measurement Techniques | ||
---|---|---|
Pearson correlation | \[r = \frac{\sum (x_i- \bar{x})(y_i- \bar{y})}{\sqrt{\sum (x_i- \bar{x})^2)\sum (y_i- \bar{y})^2)} }\] | |
Kendall rank correlation | \[\tau=\frac{n_c-n_d}{\frac{1}{2}n(n-1)}\] | |
Spearman rank correlation | \[\rho=1-\frac{6\sum d_i^2}{n(n^2-1)}\] | |
\(n\) = number of value in each data set \(n_c\) = number of concordant \(n_d\) = number of discordant \(d_i\) = the difference between the ranks of corresponding values \(x_i\) and \(y_i\) |
We can get these coefficients in Python using functions from the stats library in SciPy. The correlations have been calculated using daily log stock price returns during the training formation period. We found the 3 correlation techniques give the paired ETFs the same correlation coefficient ranking. The Pearson correlation assumes that both variables should be normally distributed. Thus here we use Kendall rank as the correlation measure and choose the pairs with the highest Kendall rank correlation to implement the pairs trading. We get the daily historical closing price of our ETFs pair by using the History function and converting the prices to a log return series. Let \(P_x\) and \(P_y\) denote the historical stock price series for stock \(x\) and stock \(y\). The log returns for the ETFs pair are given by:
\[R_x = ln(\frac{P_{x,t}}{P_{x,t-1}}), R_y = ln(\frac{P_{y,t}}{P_{y,t-1}})\] \(t = 1,2,...,n\) where \(n\) is the number of price data
def pair_selection(self, date):
'''Selects the pair of stocks with the maximum Kendall tau value.
It is called on first day of each month'''
if date.month == self.month:
return Universe.UNCHANGED
symbols = [ Symbol.create(x, SecurityType.EQUITY, Market.USA)
for x in [
"QQQ", "XLK",
"XME", "EWG",
"TNA", "TLT",
"FAS", "FAZ",
"XLF", "XLU",
"EWC", "EWA",
"QLD", "QID"
] ]
logreturns = self._get_historical_returns(symbols, self.lookbackdays)
tau = 0
for i in range(0, len(symbols), 2):
x = logreturns[str(symbols[i])]
y = logreturns[str(symbols[i+1])]
# Estimate Kendall rank correlation for each pair
tau_ = kendalltau(x, y)[0]
if tau > tau_:
continue
tau = tau_
self.pair = symbols[i:i+2]
return [x.value for x in self.pair]
Step 2: Estimating Marginal Distributions of log-return
In order to construct the copula, we need to transform the log-return series \(R_x\) and \(R_y\) to two uniformly distributed values \(u\) and \(v\). This can be done by estimating the marginal distribution functions of \(R_x\) and \(R_y\) and plugging the return values into a distribution function. As we make no assumptions about the distribution of the two log-return series, here we use the empirical distribution function to approach the marginal distribution \(F_1(R_x)\) and \(F_2(R_y)\). The Python ECDF function from the statsmodel library gives us the Empirical CDF as a step function.
Step 3: Estimating Copula Parameters
As discussed above, we estimate the copula parameter \(\theta\) by the relationship between the copula and the dependence measure Kendall’s tau, for each of the Archimedean copulas.
Copula | Kendall's tau | parameter θ |
---|---|---|
Clayton Copula | \[\frac{\theta}{\theta +2}\] | \[\theta=2\tau(1-\tau)^{-1}\] |
Gumbel Copula | \[1-\theta^{-1}\] | \[\theta=(1-\tau)^{-1}\] |
Frank Copula | \[1+4[D_1(\theta)-1]/\theta\] | \[arg min\left(\frac{\tau-1}{4}-\frac{D_1(\theta)-1}{\theta}\right)^2\] |
\[D_1(\theta)=\frac{1}{\theta}\int_{0}^{\theta}\frac{t}{exp(t)-1}dt \] |
def _parameter(self, family, tau):
''' Estimate the parameters for three kinds of Archimedean copulas
according to association between Archimedean copulas and the Kendall rank correlation measure
'''
if family == 'clayton':
return 2 * tau / (1 - tau)
elif family == 'frank':
'''
debye = quad(integrand, sys.float_info.epsilon, theta)[0]/theta is first order Debye function
frank_fun is the squared difference
Minimize the frank_fun would give the parameter theta for the frank copula
'''
integrand = lambda t: t / (np.exp(t) - 1) # generate the integrand
frank_fun = lambda theta: ((tau - 1) / 4.0 - (quad(integrand, sys.float_info.epsilon, theta)[0] / theta - 1) / theta) ** 2
return minimize(frank_fun, 4, method='BFGS', tol=1e-5).x
elif family == 'gumbel':
return 1 / (1 - tau)
Step 4: Selecting the Best Fitting Copula
Once we get the parameter estimation for the copula functions, we use the AIC criteria to select the copula that provides the best fit in algorithm initialization.
\[AIC=-2L(\theta)+2k\]
where \(L(\theta)=\sum_{t=1}^T\log c(u_t,v_t;\theta)\) is the log-likelihood function and \(k\) is the number of parameters, here \(k=1\).
The density functions of each copula function are as follows:
Copula | Density function c(u,v;θ) |
---|---|
Clayton Copula | \[(\theta+1)(u^{-\theta}+v^{-\theta}-1)^{-2-1/\theta}u^{-\theta-1}v^{-\theta-1}\] |
Gumbel Copula | \[C(u,v;\theta)(uv)^{-1}A^{-2+2/\theta}[(\ln u)(\ln v)]^{\theta -1}[1+(\theta-1)A^{-1/\theta}]\] |
Frank Copula | \[\frac{-\theta(exp(-\theta)-1)(exp(-\theta(u+v)))}{((exp(-\theta u)-1)(exp(-\theta v)-1)+(exp(-\theta)-1))^2}\] |
\[A=(-\ln u)^{\theta}+(-\ln v)^{\theta}\] |
def _lpdf_copula(self, family, theta, u, v):
'''Estimate the log probability density function of three kinds of Archimedean copulas
'''
if family == 'clayton':
pdf = (theta + 1) * ((u ** (-theta) + v ** (-theta) - 1) ** (-2 - 1 / theta)) * (u ** (-theta - 1) * v ** (-theta - 1))
elif family == 'frank':
num = -theta * (np.exp(-theta) - 1) * (np.exp(-theta * (u + v)))
denom = ((np.exp(-theta * u) - 1) * (np.exp(-theta * v) - 1) + (np.exp(-theta) - 1)) ** 2
pdf = num / denom
elif family == 'gumbel':
A = (-np.log(u)) ** theta + (-np.log(v)) ** theta
c = np.exp(-A ** (1 / theta))
pdf = c * (u * v) ** (-1) * (A ** (-2 + 2 / theta)) * ((np.log(u) * np.log(v)) ** (theta - 1)) * (1 + (theta - 1) * A ** (-1 / theta))
return np.log(pdf)
The copula that provides the best fit is the one that corresponds to the lowest value of AIC criterion. The chosen pair is "QQQ" & "XLK".
Step 5: Generating the Trading Signals
The copula functions include all the information about the dependence structures of two return series. According to Stander Y, Marais D, Botha I. in Trading strategies with copulas, the fitted copula is used to derive the confidence bands for the conditional marginal distribution function of \(C(v\mid u)\) and \(C(u\mid v)\), that is the mispricing indexes. When the market observations fall outside the confidence band, it is an indication that pairs trading opportunity is available. Here we choose 95% as the upper confidence band, 5% as the lower confidence band as indicated in the paper. The confidence level was selected based on a back-test analysis in the paper that shows using 95% seems to lead to appropriate trading opportunities to be identified.
Given current returns \(R_x, R_y\) of stock \(X\) and stock \(Y\), we define the "mis-pricing indexes" are:
\[MI_{X|Y}=P(U\leq u\mid V\leq v)=\frac{\partial C(u,v)}{\partial v}\]
\[MI_{Y|X}=P(V\leq v\mid U\leq u)=\frac{\partial C(u,v)}{\partial u}\]
For further mathematical proof, please refer to Xie W, Wu Y. Copula-based pairs trading strategy. The conditional probability formulas of bivariate copulas can be derived by taking partial derivatives of copula functions shown in Table 1. The results are as follows:
Gumbel Copula
\[C(v\mid u)=C(u,v;\theta)[(-\ln u)^\theta+(-\ln v)^\theta]^{\frac{1-\theta}{\theta}}(-\ln u)^{\theta-1}\frac{1}{u}\]
\[C(u\mid v)=C(u,v;\theta)[(-\ln u)^\theta+(-\ln v)^\theta]^{\frac{1-\theta}{\theta}}(-\ln v)^{\theta-1}\frac{1}{v}\]
Clayton Copula
\[C(v\mid u)=u^{-\theta-1}(u^{-\theta}+v^{-\theta}-1)^{-\frac{1}{\theta}-1}\]
\[C(u\mid v)=v^{-\theta-1}(u^{-\theta}+v^{-\theta}-1)^{-\frac{1}{\theta}-1}\]
Frank Copula
\[C(v\mid u)=\frac{(exp(-\theta u)-1)(exp(-\theta v)-1)+(exp(-\theta v)-1)}{(exp(-\theta u)-1)(exp(-\theta v)-1)+(exp(-\theta)-1)} \]
\[C(u\mid v)=\frac{(exp(-\theta u)-1)(exp(-\theta v)-1)+(exp(-\theta u)-1)}{(exp(-\theta u)-1)(exp(-\theta v)-1)+(exp(-\theta)-1)} \]
After selection of trading pairs and the best-fitted copulas, we take the following steps for trading. Please note we implement the Steps 1, 2, 3 and 4 on the first day of each month using the daily data for the last 12 months, which means our empirical distribution functions and copula parameters theta estimation are updated once a month. In summary each month:
- During the 12 months' rolling formation period, daily close prices are used to calculate the daily log returns for the pair of ETFs and then compute Kendall's rank correlation.
- Estimate the marginal distribution functions of log returns of X and Y, which are ecdf_x and ecdf_y separately.
- Plug Kendall's tau into copula parameter estimation functions to get the value of theta.
- Run linear regression over the two price series. The coefficient is used to determine how many shares of stock X and Y to buy and sell. For example, if the coefficient is 2, for every X share that is bought or sold, 2 units of Y are sold or bought.
def set_signal(self, slice):
'''Computes the mispricing indices to generate the trading signals.
It's called on first day of each month'''
if self.time.month == self.month:
return
## Compute the best copula
# Pull historical log returns used to determine copula
logreturns = self._get_historical_returns(self.pair, self.numdays)
x, y = logreturns[str(self.pair[0])], logreturns[str(self.pair[1])]
# Convert the two returns series to two uniform values u and v using the empirical distribution functions
ecdf_x, ecdf_y = ECDF(x), ECDF(y)
u, v = [ecdf_x(a) for a in x], [ecdf_y(a) for a in y]
# Compute the Akaike Information Criterion (AIC) for different copulas and choose copula with minimum AIC
tau = kendalltau(x, y)[0] # estimate Kendall'rank correlation
AIC ={} # generate a dict with key being the copula family, value = [theta, AIC]
for i in ['clayton', 'frank', 'gumbel']:
param = self._parameter(i, tau)
lpdf = [self._lpdf_copula(i, param, x, y) for (x, y) in zip(u, v)]
# Replace nan with zero and inf with finite numbers in lpdf list
lpdf = np.nan_to_num(lpdf)
loglikelihood = sum(lpdf)
AIC[i] = [param, -2 * loglikelihood + 2]
# Choose the copula with the minimum AIC
self.copula = min(AIC.items(), key = lambda x: x[1][1])[0]
## Compute the signals
# Generate the log return series of the selected trading pair
logreturns = logreturns.tail(self.lookbackdays)
x, y = logreturns[str(self.pair[0])], logreturns[str(self.pair[1])]
# Estimate Kendall'rank correlation
tau = kendalltau(x, y)[0]
# Estimate the copula parameter: theta
self.theta = self._parameter(self.copula, tau)
# Simulate the empirical distribution function for returns of selected trading pair
self.ecdf_x, self.ecdf_y = ECDF(x), ECDF(y)
# Run linear regression over the two history return series and return the desired trading size ratio
self.coef = stats.linregress(x,y).slope
self.month = self.time.month
Finally during the trading period, each day we convert today's returns to u and v by using empirical distribution functions ecdf_x and ecdf_y. After that, two mispricing Indexes are calculated every trading day by using the estimated copula C. The algorithm constructs short positions in X and long positions in Y on the days that \(MI_{Y|X}<0.05\) and \(MI_{X|Y}>0.95\). It constructs short position in Y and long positions in X on the days that \(MI_{Y|X}>0.95\) and \(MI_{X|Y}<0.05\).
def on_data(self, slice):
'''Main event handler. Implement trading logic.'''
self.set_signal(slice) # only executed at first day of each month
# Daily rebalance
if self.time.day == self.day:
return
long, short = self.pair[0], self.pair[1]
# Update current price to trading pair's historical price series
for kvp in self.securities:
symbol = kvp.key
if symbol in self.pair:
price = kvp.value.PRICE
self.window[symbol].append(price)
if len(self.window[long]) < 2 or len(self.window[short]) < 2:
return
# Compute the mispricing indices for u and v by using estimated copula
MI_u_v, MI_v_u = self._misprice_index()
# Placing orders: if long is relatively underpriced, buy the pair
if MI_u_v < self.floor__c_l and MI_v_u > self.cap__c_l:
self.set_holdings(short, -self.weight_v, False, f'Coef: {self.coef}')
self.set_holdings(long, self.weight_v * self.coef * self.portfolio[long].price / self.portfolio[short].price)
# Placing orders: if short is relatively underpriced, sell the pair
elif MI_u_v > self.cap__c_l and MI_v_u < self.floor__c_l:
self.set_holdings(short, self.weight_v, False, f'Coef: {self.coef}')
self.set_holdings(long, -self.weight_v * self.coef * self.portfolio[long].price / self.portfolio[short].price)
self.day = self.time.day
Part II - Cointegration Method
For the cointegration pairs trading method, we choose the same ETF pair "GLD" and "DGL". There is no need to choose a copula function so there is only a 12 month rolling formation period. The trading period is 5 years from January 2011 to May 2017.
Step 1: Generate the Spread Series
At the start of each month, we generate the log price series of two ETFs with the daily close. Then the spread series is estimated using regression analysis based on log price series data. For Equities X and Y, we run linear regression over the log price series and get the coefficient β.
\[spread_t=\log(price_t^y)-\beta \log(price_t^x)\]
Step 2: Compute the Signals
Using the standard deviation of spread during the rolling formation period, a threshold of one standard deviation is set up for the trading strategy. We enter a trade whenever the spread moves more than one standard deviations away from its mean. Trades are exited when the spread reverts back to the mean trailing spread value. The position sizes are scaled by the coefficient β.
log_close_x = np.log(self.closes_by_symbol[self.x_symbol])
log_close_y = np.log(self.closes_by_symbol[self.y_symbol])
spread, beta = self.regr(log_close_x, log_close_y)
mean = np.mean(spread)
std = np.std(spread)
x_holdings = self.portfolio[self.x_symbol]
if x_holdings.invested:
if x_holdings.is_short and spread[-1] <= mean or \
x_holdings.is_long and spread[-1] >= mean:
self.liquidate()
else:
if beta < 1:
x_weight = 0.5
y_weight = 0.5 / beta
else:
x_weight = 0.5 / beta
y_weight = 0.5
if spread[-1] < mean - self.threshold * std:
self.set_holdings(self.y_symbol, -y_weight)
self.set_holdings(self.x_symbol, x_weight)
if spread[-1] > mean + self.threshold * std:
self.set_holdings(self.x_symbol, -x_weight)
self.set_holdings(self.y_symbol, y_weight)
Summary
Ultimately pairs trading intends to capture the price divergence of two correlated assets through mean reversion. Our results demonstrate that the copula approach for pairs trading is superior to the conventional cointegration method because it is based on the probability of the dependence structure, vs cointegration which relies on simple linear regression variance from normal pricing. We found through testing the performance of the copula method less sensitive to the starting parameters. Because the cointegration method relies on standard distribution and the ETF pairs had low volatility there were few trading opportunities.
method | Transactions | Profit | Sharpe Ratio | Drawdown |
---|---|---|---|---|
Copula | 498 | 7.057% | 0.098 | 24.0% |
Cointegration | 126 | 4.506% | 0.179 | 3.9% |
Generally, ETFs are not very volatile and so mean-reversion did not provide many trading opportunities. There are only 91 trades during 5 years for cointegration method. It is observed that the use of copula in pairs trading provides more trading opportunities as it does not require any rigid assumptions according to Liew R Q, Wu Y. - Pairs trading A copula approach.
Reference
- Stander Y, Marais D, Botha I. Trading strategies with copulas[J]. Journal of Economic and Financial Sciences, 2013, 6(1): 83-107. Online Copy
- Hanson T A, Hall J R. Statistical arbitrage trading strategies and high-frequency trading[J]. 2012. Online Copy
- Mahfoud M, Michael M. Bivariate Archimedean copulas: an application to two stock market indices[J]. BMI Paper, 2012. Online Copy
- Rad H, Low R K Y, Faff R. The profitability of pairs trading strategies: distance, cointegration and copula methods[J]. Quantitative Finance, 2016, 16(10): 1541-1558.Online copy
- LANDGRAF N, SCHOLTUS K, DIRIS D R B. High-Frequency copula-based pairs trading on US Goldmine Stocks[J]. 2016. Online Copy
- Genest, C. and MacKay, J., 1986, The Joy of Copulas: Bivariate Distributions with Uniform Marginals, The American Statistician, 40, 280-283 Online Copy
- Jean Folger, Pairs Trading Example Online Copy
- Xie W, Wu Y. Copula-based pairs trading strategy[C] Asian Finance Association (AsFA) 2013 Conference. doi. 2013, 10. Online Copy
- Liew R Q, Wu Y. Pairs trading: A copula approach[J]. Journal of Derivatives & Hedge Funds, 2013, 19(1): 12-30. Online Copy
Derek Melchin
See the attached backtest for an updated version of the copula algorithm in PEP8 style.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Jing Wu
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!