This page is a republication of the Probabilistic Sharpe Ratio forum post by Jack Simonson. To reproduce the results, run the attached notebook.

 

Introduction

The Sharpe ratio is a widely employed metric and is de facto the most popular tool to measure investment performance. The Sharpe ratio characterizes risk-adjusted excess returns and is an easy metric to compute, only requiring strategy returns.

\begin{equation}SR = \frac{\mu - r_{f}}{\sigma}\end{equation}

where `\mu` and `\sigma` are the mean and standard deviation of the strategy returns, respectively, and `r_{f}` is the risk-free rate (often set as the overnight federal funds rate but assumed to be 0 here). 

Sharpe Ratio Limitations

Despite its widespread use, the Sharpe ratio has some significant limitations. The implied assumptions of the Sharpe ratio are that returns are independent and identically distributed (IID) Normal. Unfortunately, these assumptions can mask considerable downside risks in the case of strategies whose returns demonstrate significant non-normality.

The purpose of the Sharpe ratio is to evaluate the skill of a particular strategy, which is assumed to produce returns according to a certain distribution. Contrary to one of the implied assumptions, extensive empirical evidence shows that returns generally do not follow a normal distribution. Since the function that characterizes the distribution of a strategy's returns is unknown, the true mean `\mu`, variance `\sigma^{2}`, and distribution are also unknown. Because of this, the true Sharpe ratio of the strategy cannot be known for certain. The observed Sharpe ratio, then, is a point estimate and is subject to estimation error (additional information on point estimated and estimation error can be found in the Kelly Criterion research notebook).

Like any estimator, the Sharpe ratio has a probability distribution and Mertens (2002) shows that the Sharpe ratio estimate follows a normal distribution, even when the underlying returns are not normally distributed. Non-normal skewness and kurtosis do not affect the value of the observed Sharpe ratio but they have a significant impact on the statistical significance of the observed Sharpe ratio. Infinite different returns with infinite different distributions can produce the same Sharpe ratio, but the level of confidence that the observed Sharpe ratio is a result of investment skill will vary. So, although different strategies can produce the same Sharpe ratio, they will vary in the statistical significance of that estimate, which is an important factor when considering the viability of an investment strategy.

PSR Definition

Given the plethora of empirical evidence showing that returns do not follow a normal distribution, Bailey and de Prado (2012) propose that the true Sharpe ratio `SR` of a population of returns is bounded by the estimated (observed) Sharpe ratio `\hat{SR}` with significance level `\alpha`

\begin{equation}P\left[SR \in \left(\hat{SR} - Z_{\alpha/2}\hat{\sigma}, \hat{SR} + Z_{\alpha/2}\hat{\sigma}\right)\right] = 1 - \alpha\end{equation}

where `\hat{\sigma}` is the standard deviation of the estimator `\hat{SR}`. From this and plenty of algebra found in the paper cited above, we can calculate the Probabilistic Sharpe Ratio (PSR), which is the probability that the estimated Sharpe ratio `\hat{SR}` is above a given benchmark `SR^{\ast}`

\begin{equation}PSR(SR^{\ast}) = P\left(\hat{SR} > SR^{\ast}\right) = Z\left(\frac{(\hat{SR} - SR^{\ast})\sqrt{n-1}}{\sqrt{1 - \hat{\gamma}_{3}\hat{SR} + \frac{\hat{\gamma}_{4}-1}{4}\hat{SR}^{2}}}\right)\end{equation}

where `\hat{\gamma_{3}}` is the observed skewness of the returns, `\hat{\gamma_{4}}` is the observed kurtosis, and `Z` is the cumulative distribution function of the Standard Normal distribution.

The PSR is a probability measure associated with the Sharpe ratio calculated from a sample of returns. It informs us of the probability that the estimated Sharpe ratio is greater than a chosen benchmark (i.e. whether or not the estimate is statistically significant). While the Sharpe ratio is a way to measure the performance of a strategy, the PSR is an atemporal measure of strategy performance expressed in terms of probability of skill beyond a given benchmark. Due to the this, all calculations are done in their original frequency of the data and no annualization is performed. This makes the PSR a preferred metric to the traditional annualized Sharpe ratio in the context of strategies with irregular trading frequencies.

PSR Tests

S&P 500 Benchmark Example

We can now look various examples using the (non-annualized) Sharpe ratio of the S&P 500 Index ETF SPY over the last year (we will assume that the estimated Sharpe ratio is the true Sharpe ratio for the sake of using it as a benchmark). First, we can examine how a "dummy" strategy would compare. The strategy is a simple buy-and-hold of the symbols below using an equal-weighting scheme. As can be seen below, we notice that even though our strategy has a better Sharpe ratio of 0.07, we can only be 77% confident that it is actually larger than the SPY benchmark Sharpe ratio of 0.05. By using the PSR and incorporating information about the non-normality of the returns, we can reveal information that would otherwise be lost when just comparing the two Sharpe ratios, which better informs our investment decisions.

symbols = ['HSY','GIS','AAPL','KMI','SJM','MSFT','CPB','V','NFLX','TWTR',]
benchmark = Benchmark(qb, start, end, 'SPY')
simple_strategy = BuyAndHold(qb, start, end, symbols)
benchmark_sr = benchmark.get_sharpe_ratio()
print(f'Benchmark Sharpe Ratio: {benchmark.get_sharpe_ratio()}')
print(f'Buy and Hold Strategy Sharpe Ratio: {simple_strategy.get_sharpe_ratio()}')
print(f'PSR({benchmark_sr}): {simple_strategy.get_psr(benchmark_sr)}')

Benchmark Sharpe Ratio: 0.05
Buy and Hold Strategy Sharpe Ratio: 0.07
PSR(0.05): 0.765

 

Zero-Skill Example

Taking an example from the work of Bailey and de Prado (2018), we can see how the PSR varies when returns are assumed to be normally distributed and when we account for the non-normality of strategy returns. For this example, we assume that the observed Sharpe ratio of the strategy is 0.458 and the observed Sharpe ratio of the benchmark is 0. We can examine two cases: the first where returns are assumed to be normal (skewness = 0, kurtosis = 3) and then when returns are non-normal (skewness = -2.448, kurtosis = 10.164). We then compute `PSR(0)` for each case, which gives us the probability that the observed Sharpe ratio of 0.458 is a result of skill given that the observed Sharpe ratio of the benchmark is 0.

(A Sharpe ratio of 0 implies no skill as it can be achieved by simply not investing).

print(f'PSR(0) with normal returns: {zero_skill_normal()}')
print(f'PSR(0) with non-normal returns: {zero_skill_non_normal()}')

PSR(0) with normal returns: 0.989
PSR(0) with non-normal returns: 0.934

As can be seen above, we would accept the estimated Sharpe ratio of 0.458 as statistically significant and demonstrative of skill at the 95% confidence level when the returns were perfectly normal (since 98.9% > 95%), but we could not reject the null hypothesis that the observed Sharpe ratio is not a product of skill when we account for the skewness and kurtosis of returns (since 93.4% < 95%).

Further Examination of Non-Normality

Accounting for the distribution characteristics of returns makes a significant impact on the Probabilistic Sharpe Ratio. All else being equal, assuming normality of returns consistently inflates the confidence level of the observed Sharpe ratio. However, since returns are generally non-normally distributed, it is essential to account for these attributes to obtain a more realistic assessment of the statistical significance of the observed Sharpe ratio.

Given an arbitrary benchmark Sharpe ratio of 1.24, we can examine the effects of varying levels of non-normality of returns and how they impact the significance of a hypothetical observed Sharpe ratio of 1.45. We chose an arbitrary case where sample returns produce a skewness of -2 and kurtosis of 8, which is representative of the general characteristics of strategy returns: negatively-skewed with fat tails (large kurtosis).

## Test same SR with varying length of returns sample (i.e. length of track record)
varying_sample_length()
derek-melchin_1716418512.jpg
## Test same SR with varying returns skewness
varying_skewness()
derek-melchin_1716418566.jpg
## Test same SR with varying returns kurtosis
varying_kurtosis()
derek-melchin_1716418607.jpg
## Third, test same observed Sharpe ratio vs varying benchmarks
plot_varying_benchmarks()
derek-melchin_1716418642.jpg

One of the most significant results from the examples above and examining the PSR formula is that sample length matters, a lot! PSR increases with a larger observed Sharpe ratio, longer sample length or positively skewed returns but decreases with more significant kurtosis. Even when the observed Sharpe ratio of the strategy is lower than the observed Sharpe ratio of the benchmark, the first 3 plots above show `PSR > 50%`, so just comparing the observed Sharpe ratios is insufficient to determine if the strategy is likely to outperform the benchmark going forward.

Summary

The Sharpe ratio is probably the most widely used investment performance metric. Unfortunately, the Sharpe ratio is often misused and can misrepresent the performance of a strategy in the long term. We have seen that the actual value of the Sharpe ratio calculated on a set of returns is immune to the distribution of returns, but the characteristics of the returns play a significant role in the confidence we can have that the Sharpe ratio is statistically significant. No matter how high a strategy's Sharpe ratio is, we want to make sure that the estimate is being made with a high level of confidence before we say that the strategy outperforms a benchmark.

We propose using the S&P 500, non-annualized Sharpe ratio as a benchmark to calculate the Probabilistic Sharpe Ratio of Equity strategies. This is a benchmark that is already widely understood and used, and by doing this we can provide investors with a more nuanced understanding of the performance of individual strategies relative to a naive market strategy. Similarly, other popular benchmarks could be selected for other asset classes. 

References

  1. Bailey, David H. and López de Prado, Marcos. The Sharpe Ratio Efficient Frontier (April 2012). SSRN Electronic Journal. 10.2139/ssrn.1821643.
  2. López de Prado, Marcos and Lewis, Michael J. Detection of False Investment Strategies Using Unsupervised Learning Methods (August 18, 2018). SSRN Electronic Journal. 10.2139/ssrn.3167017.
  3. Mertens, Elmar. Comments on variance of the IID estimator in Lo (2002).