Abstract

In recent years, factor investing gained significant popularity among global institutional investors. In this tutorial, we first developed a factor selection model to test if factors have the ability to differentiate potential winners and losers in the stock market. Then we use those preselected factors to implement the factor ranking stock selection algorithm based on Factor Based Stock Selection Model for Turkish Equities, 2015, Ayhan Yüksel.

Factor Selection

QuantConnect provides Morningstar fundamentals data for US Equities. Valuation Ratios is daily data. For others like operation ratios and financial statements data are available for multiple periods depending on the property. To view the fundamental factors that are availalbe, see Data Point Attributes.

The algorithm is designed to test the significance of one factor each time.

  1. def initialize(self):
  2. self.set_start_date(2005,01,01) #Set Start Date
  3. self.set_end_date(2012,03,01) #Set End Date
  4. self.set_cash(50000) #Set Strategy Cash
  5. self.universe_settings.resolution = Resolution.DAILY
  6. self.add_universe(self.coarse_selection_function, self.fine_selection_function)
  7. self.add_equity("SPY") # add benchmark
  8. self.num_of_course_symbols = 200
  9. self.num_of_portfolio = 5
  10. self._changes = None
  11. self.flag1 = 1 # variable to control the monthly rebalance of coarse and fine selection function
  12. self.flag2 = 0 # variable to control the monthly rebalance of OnData function
  13. self.flag3 = 0 # variable to record the number of rebalancing times
  14. # store the monthly returns of different portfolios in a dataframe
  15. self.df_return = pd.DataFrame(index = range(self.num_of_portfolio+1))
  16. # schedule an event to fire at the first trading day of SPY
  17. self.schedule.on(self.date_rules.month_start("SPY"), self.time_rules.after_market_open("SPY"), Action(self.rebalancing))
+ Expand

Step 1: Ranking the stocks by factor values

First, we sort the stocks by daily dollar volume and take the top stocks with the highest dollar volumes as our candidates. There is a convenient way using our universe selection API. Universes are refreshed every day by default. Here we use Scheduled events API to trigger code to run at the first trading day each month and use three flag variables to control the rebalancing of CoarseSelection, FineSelection and Ondata functions.

Coarse universe selection is the built-in universe data provided by QuantConnect which allows you to filter the universe of over 16,000 symbols to perform rough filtering before your algorithm. Because coarse selection function takes all the Equities including ETFs which have no fundamental data into account, we need the property x.has_fundamental_data to exclude them from our candidate stocks pool.

  1. # sort the data by daily dollar volume and take the top entries
  2. def coarse_selection_function(self, coarse):
  3. if self.flag1:
  4. coarse_with_fundamental = [x for x in coarse if x.has_fundamental_data]
  5. sorted_by_volume = sorted(coarse_with_fundamental, key=lambda x: x.dollar_volume, reverse=True)
  6. top = sorted_by_volume[:self.num_of_course_symbols]
  7. return [i.symbol for i in top]
  8. else:
  9. return []

We extract the factor values of candidate stocks at the beginning of each month and sort the stocks in ascending order of their factor values. Here we use 12-months' total risk-based capital data

  1. x.financial_statements.total_risk_based_capital.twelve_months

as an example. It is the sum of Tier 1 and Tier 2 Capital. x.symbol.value can give the string symbol of selected stock x. Then we save those sorted symbols as self.symbol.

  1. def fine_selection_function(self, fine):
  2. if self.flag1:
  3. self.flag1 = 0
  4. self.flag2 = 1
  5. # filter the fine by deleting equities wit zero factor value
  6. filtered_fine = [x for x in fine if x.financial_statements.total_risk_based_capital.twelve_months != 0 ]
  7. # sort the fine by reverse order of factor value
  8. sorted_fine = sorted(filtered_fine, key=lambda x: x.financial_statements.total_risk_based_capital.twelve_months, reverse=True)
  9. self.symbol = [str(x.symbol.value) for x in sorted_fine]
  10. # factor_value = [x.valuation_ratios.pe_ratio for x in sorted_fine]
  11. self.flag3 = self.flag3 + 1
  12. return []
  13. else:
  14. return []

Step 2: Compute the monthly return of portfolios

At the end of each month, we extract the one-month history close prices of each stock and compute the monthly returns.

  1. sorted_symbol = self.symbol
  2. self.add_equity("SPY") # add benchmark
  3. for x in sorted_symbol:
  4. self.add_equity(x)
  5. history = self.history(20,Resolution.DAILY)
  6. monthly_return =[]
  7. new_symbol_list =[]
  8. for j in range(len(sorted_symbol)):
  9. try:
  10. daily_price = []
  11. for slice in history:
  12. bar = slice[sorted_symbol[j]]
  13. daily_price.append(float(bar.close))
  14. new_symbol_list.append(sorted_symbol[j])
  15. monthly_return.append(daily_price[-1] / daily_price[0] - 1)
  16. except:
  17. self.log("No history data for " + str(sorted_symbol[j]))
  18. del daily_price
  19. # the length of monthly_return list should be divisible by the number of portfolios
  20. monthly_return = monthly_return[:int(math.floor(len(monthly_return) / self.num_of_portfolio) * self.num_of_portfolio)]
+ Expand

We divide the stocks into 5 portfolios and compute the average monthly returns of each portfolio. Then we add the monthly return of benchmark "SPY" at the last line of the data frame df_return.

  1. reshape_return = np.reshape(monthly_return, (self.num_of_portfolio, len(monthly_return)/self.num_of_portfolio))
  2. # calculate the average return of different portfolios
  3. port_avg_return = np.mean(reshape_return,axis=1).tolist()
  4. # add return of "SPY" as the benchmark to the end of the return list
  5. benchmark_syl = self.add_equity("SPY").symbol
  6. history_benchmark = self.history(20,Resolution.DAILY)
  7. benchmark_daily_price = [float(slice[benchmark_syl].close) for slice in history_benchmark]
  8. benchmark_monthly_return = (benchmark_daily_price[-1]/benchmark_daily_price[0]) - 1
  9. port_avg_return.append(benchmark_monthly_return)
  10. self.df_return[str(self.flag3)] = port_avg_return

Step 3: Generate the metrics to test the factor significance

After getting the monthly returns of portfolios and the benchmark, we compute the average annual return and excess return over benchmark of each portfolio across the whole backtesting period. Then we generate three metrics to judge the significance of each factor.

  • The first metrics is the correlation between the portfolio' returns and their rank. The absolute value of the correlation coefficient should larger than 0.8.
  • If the return of the rank first portfolio larger than the portfolio at the bottom of the return rankings, we define it the win portfolio and the loss portfolio and vice versa. The win probability is the probability that the win portfolio return outperform the benchmark return. The loss probability is the probability that the loss portfolio return underperform the benchmark.  If the factor is significant, both loss and win probability should greater than 0.4.
  • The excess return of win portfolio should be greater than 0.25, while the excess return of loss portfolio should be lower than 0.05.
  1. def calculate_criteria(self,df_port_return):
  2. total_return = (df_port_return + 1).T.cumprod().iloc[-1,:] - 1
  3. annual_return = (total_return+1)**(1./6)-1
  4. excess_return = annual_return - np.array(annual_return)[-1]
  5. correlation = annual_return[0:5].corr(pd.Series([5,4,3,2,1],index = annual_return[0:5].index))
  6. # higher factor with higher return
  7. if np.array(total_return)[0] > np.array(total_return)[-2]:
  8. loss_excess = df_port_return.iloc[-2,:] - df_port_return.iloc[-1,:]
  9. win_excess = df_port_return.iloc[0,:] - df_port_return.iloc[-1,:]
  10. loss_prob = loss_excess[loss_excess<0].count()/float(len(loss_excess)) win_prob = win_excess[win_excess>0].count()/float(len(win_excess))
  11. win_port_excess_return = np.array(excess_return)[0]
  12. loss_port_excess_return = np.array(excess_return)[-2]
  13. # higher factor with lower return
  14. else:
  15. loss_excess = df_port_return.iloc[0,:] - df_port_return.iloc[-1,:]
  16. win_excess = df_port_return.iloc[-2,:] - df_port_return.iloc[-1,:]
  17. loss_prob = loss_excess[loss_excess<0].count()/float(len(loss_excess)) win_prob = win_excess[win_excess>0].count()/float(len(win_excess))
  18. win_port_excess_return = np.array(excess_return)[-2]
  19. loss_port_excess_return = np.array(excess_return)[0]
  20. test_result = {}
  21. test_result["correelation"]=correlation
  22. test_result["win probality"]=win_prob
  23. test_result["loss probality"]=loss_prob
  24. test_result["win portfolio excess return"]=win_port_excess_return
  25. test_result["loss portfolio excess return"]=loss_port_excess_return
  26. return test_result
+ Expand

The follow tables shows the factor significance testing result:

 Factor  FCFYield  BuyBackYield  PriceChange1M TrailingDividendYield  EVToEBITDA  RevenueGrowth BookValuePerShare
 The correlation  -0.936  -0.987  0.918  -0.981  0.939 0.89 -0.92
Win Probability 0.630 0.639  1  0.667  0.722  0.69 0.69
Loss probability  0.426 0.472  1  0.518  0.472  0.42 0.40
 Excess Return(Win)  0.324  0.212 0.303  0.225  0.414  0.23  0.27
 Excess Return(Loss)  0.060  0.037  -1.67  0.043  0.042  0.07  0.06

We choose 4 factors: FCFYieldPriceChange1MBookValuePerShare and RevenueGrowth.

Stock Selection

Next we will select the stocks.

Step 1: Rank the stocks by factor values

First, we remove the stocks without fundamental data or have zero factor value. For each pre-selected factor, we rank the stocks by those factor values. The order is descending if the factor correlation is negative, it is ascending if the factor correlation is positive.

Step 2: Calculate equally weighted composite factor scores

The second step is using different selected factor variables to calculate an equally weighted composite factor score for each stock.

  • First, according to the factor order, we place our universe of stocks into 5 distinct quintile portfolios, named P1, P2, P3, P4 and P5. The ranking of portfolios sets out the preference of the factor model, i.e. the first portfolio (P1) corresponds to the “most preferred” stocks, while the fifth (P5) corresponds to the “least preferred” stocks. Suppose there are n stocks in total. Then the stocks fall into the first rank portfolio will have score p, the stocks fall into the second rank portfolio will get score p1 and so on. Then we can get a score for every stock. We did the same calculation for each factor.
  • Second, we calculate a “Composite Factor Score” by combining the six-factor scores and using an equal weighting scheme. Then we get composite factor score for each stock.
  • Third, we then rank the stocks in our universe according to their Composite Factor Scores and choose the highest ranked 20 stocks to construct our portfolios at the beginning of each month.
  • At the end of each month, we repeat the above steps to construct the new portfolio and adjust the holding stocks.


Reference

  1. Factor Based Stock Selection Model for Turkish Equities, 2015, Ayhan Yüksel Online Copy

Author

Jing Wu

June 2018