Abstract

In recent years, factor investing gained significant popularity among global institutional investors. In this tutorial, we first developed a factor selection model to test if factors have the ability to differentiate potential winners and losers in the stock market. Then we use those preselected factors to implement the factor ranking stock selection algorithm based on Factor Based Stock Selection Model for Turkish Equities, 2015, Ayhan Yüksel.

Factor Selection

QuantConnect provides Morningstar fundamentals data for US Equities. Valuation Ratios is daily data. For others like operation ratios and financial statements data are available for multiple periods depending on the property. To view the fundamental factors that are availalbe, see Data Point Attributes.

The algorithm is designed to test the significance of one factor each time.

def Initialize(self):
    self.SetStartDate(2005,01,01)  #Set Start Date
    self.SetEndDate(2012,03,01)    #Set End Date
    self.SetCash(50000)            #Set Strategy Cash
    self.UniverseSettings.Resolution = Resolution.Daily
    self.AddUniverse(self.CoarseSelectionFunction, self.FineSelectionFunction)
    self.AddEquity("SPY") # add benchmark
    self.numOfCourseSymbols = 200
    self.numOfPortfolio = 5
    self._changes = None
    self.flag1 = 1  # variable to control the monthly rebalance of coarse and fine selection function
    self.flag2 = 0  # variable to control the monthly rebalance of OnData function
    self.flag3 = 0  # variable to record the number of rebalancing times
        # store the monthly returns of different portfolios in a dataframe
    self.df_return = pd.DataFrame(index = range(self.numOfPortfolio+1))
        # schedule an event to fire at the first trading day of SPY
    self.Schedule.On(self.DateRules.MonthStart("SPY"), self.TimeRules.AfterMarketOpen("SPY"), Action(self.Rebalancing))

Step 1: Ranking the stocks by factor values

First, we sort the stocks by daily dollar volume and take the top stocks with the highest dollar volumes as our candidates. There is a convenient way using our universe selection API. Universes are refreshed every day by default. Here we use Scheduled events API to trigger code to run at the first trading day each month and use three flag variables to control the rebalancing of CoarseSelection, FineSelection and Ondata functions.

Coarse universe selection is the built-in universe data provided by QuantConnect which allows you to filter the universe of over 16,000 symbols to perform rough filtering before your algorithm. Because coarse selection function takes all the Equities including ETFs which have no fundamental data into account, we need the property x.HasFundamentalData to exclude them from our candidate stocks pool.

# sort the data by daily dollar volume and take the top entries
def CoarseSelectionFunction(self, coarse):
    if self.flag1:
        CoarseWithFundamental = [x for x in coarse if x.HasFundamentalData]
        sortedByVolume = sorted(CoarseWithFundamental, key=lambda x: x.DollarVolume, reverse=True)
        top = sortedByVolume[:self.numOfCourseSymbols]
                return [i.Symbol for i in top]
    else:
        return []

We extract the factor values of candidate stocks at the beginning of each month and sort the stocks in ascending order of their factor values. Here we use 12-months' total risk-based capital data

x.FinancialStatements.TotalRiskBasedCapital.TwelveMonths

as an example. It is the sum of Tier 1 and Tier 2 Capital. x.Symbol.Value can give the string symbol of selected stock x. Then we save those sorted symbols as self.symbol.

def FineSelectionFunction(self, fine):
    if self.flag1:
        self.flag1 = 0
        self.flag2 = 1
        # filter the fine by deleting equities wit zero factor value
        filtered_fine = [x for x in fine if x.FinancialStatements.TotalRiskBasedCapital.TwelveMonths != 0 ]
        # sort the fine by reverse order of factor value
        sorted_fine = sorted(filtered_fine, key=lambda x: x.FinancialStatements.TotalRiskBasedCapital.TwelveMonths, reverse=True)
        self.symbol = [str(x.Symbol.Value) for x in sorted_fine]
        # factor_value = [x.ValuationRatios.PERatio for x in sorted_fine]
        self.flag3 = self.flag3 + 1
        return []
    else:
        return []

Step 2: Compute the monthly return of portfolios

At the end of each month, we extract the one-month history close prices of each stock and compute the monthly returns.

sorted_symbol = self.symbol
self.AddEquity("SPY") # add benchmark
for x in sorted_symbol:
    self.AddEquity(x)
history = self.History(20,Resolution.Daily)
monthly_return =[]
new_symbol_list =[]
for j in range(len(sorted_symbol)):
    try:
        daily_price = []
        for slice in history:
            bar = slice[sorted_symbol[j]]
            daily_price.append(float(bar.Close))
        new_symbol_list.append(sorted_symbol[j])
        monthly_return.append(daily_price[-1] / daily_price[0] - 1)
    except:
        self.Log("No history data for " + str(sorted_symbol[j]))
        del daily_price
# the length of monthly_return list should be divisible by the number of portfolios
monthly_return = monthly_return[:int(math.floor(len(monthly_return) / self.numOfPortfolio) * self.numOfPortfolio)]

We divide the stocks into 5 portfolios and compute the average monthly returns of each portfolio. Then we add the monthly return of benchmark "SPY" at the last line of the data frame df_return.

reshape_return = np.reshape(monthly_return, (self.numOfPortfolio, len(monthly_return)/self.numOfPortfolio))
# calculate the average return of different portfolios
port_avg_return = np.mean(reshape_return,axis=1).tolist()
# add return of "SPY" as the benchmark  to the end of the return list
benchmark_syl = self.AddEquity("SPY").Symbol
history_benchmark = self.History(20,Resolution.Daily)
benchmark_daily_price = [float(slice[benchmark_syl].Close) for slice in history_benchmark]
benchmark_monthly_return = (benchmark_daily_price[-1]/benchmark_daily_price[0]) - 1
port_avg_return.append(benchmark_monthly_return)
self.df_return[str(self.flag3)] = port_avg_return

Step 3: Generate the metrics to test the factor significance

After getting the monthly returns of portfolios and the benchmark, we compute the average annual return and excess return over benchmark of each portfolio across the whole backtesting period. Then we generate three metrics to judge the significance of each factor.

  • The first metrics is the correlation between the portfolio' returns and their rank. The absolute value of the correlation coefficient should larger than 0.8.
  • If the return of the rank first portfolio larger than the portfolio at the bottom of the return rankings, we define it the win portfolio and the loss portfolio and vice versa. The win probability is the probability that the win portfolio return outperform the benchmark return. The loss probability is the probability that the loss portfolio return underperform the benchmark.  If the factor is significant, both loss and win probability should greater than 0.4.
  • The excess return of win portfolio should be greater than 0.25, while the excess return of loss portfolio should be lower than 0.05.
def calculate_criteria(self,df_port_return):
    total_return = (df_port_return + 1).T.cumprod().iloc[-1,:] - 1
    annual_return = (total_return+1)**(1./6)-1
    excess_return = annual_return - np.array(annual_return)[-1]
    correlation = annual_return[0:5].corr(pd.Series([5,4,3,2,1],index = annual_return[0:5].index))
    # higher factor with higher return
    if np.array(total_return)[0] > np.array(total_return)[-2]:
        loss_excess = df_port_return.iloc[-2,:] - df_port_return.iloc[-1,:]
        win_excess = df_port_return.iloc[0,:] - df_port_return.iloc[-1,:]
        loss_prob = loss_excess[loss_excess<0].count()/float(len(loss_excess)) win_prob = win_excess[win_excess>0].count()/float(len(win_excess))
        win_port_excess_return = np.array(excess_return)[0]
        loss_port_excess_return = np.array(excess_return)[-2]
    # higher factor with lower return
    else:
        loss_excess = df_port_return.iloc[0,:] - df_port_return.iloc[-1,:]
        win_excess = df_port_return.iloc[-2,:] - df_port_return.iloc[-1,:]
        loss_prob = loss_excess[loss_excess<0].count()/float(len(loss_excess)) win_prob = win_excess[win_excess>0].count()/float(len(win_excess))
        win_port_excess_return = np.array(excess_return)[-2]
        loss_port_excess_return = np.array(excess_return)[0]
    test_result = {}
    test_result["correelation"]=correlation
    test_result["win probality"]=win_prob
    test_result["loss probality"]=loss_prob
    test_result["win portfolio excess return"]=win_port_excess_return
    test_result["loss portfolio excess return"]=loss_port_excess_return

    return test_result

The follow tables shows the factor significance testing result:

 Factor  FCFYield  BuyBackYield  PriceChange1M TrailingDividendYield  EVToEBITDA  RevenueGrowth BookValuePerShare
 The correlation  -0.936  -0.987  0.918  -0.981  0.939 0.89 -0.92
Win Probability 0.630 0.639  1  0.667  0.722  0.69 0.69
Loss probability  0.426 0.472  1  0.518  0.472  0.42 0.40
 Excess Return(Win)  0.324  0.212 0.303  0.225  0.414  0.23  0.27
 Excess Return(Loss)  0.060  0.037  -1.67  0.043  0.042  0.07  0.06

We choose 4 factors: FCFYieldPriceChange1MBookValuePerShare and RevenueGrowth.

Stock Selection

Next we will select the stocks.

Step 1: Rank the stocks by factor values

First, we remove the stocks without fundamental data or have zero factor value. For each pre-selected factor, we rank the stocks by those factor values. The order is descending if the factor correlation is negative, it is ascending if the factor correlation is positive.

Step 2: Calculate equally weighted composite factor scores

The second step is using different selected factor variables to calculate an equally weighted composite factor score for each stock.

  • First, according to the factor order, we place our universe of stocks into 5 distinct quintile portfolios, named P1, P2, P3, P4 and P5. The ranking of portfolios sets out the preference of the factor model, i.e. the first portfolio (P1) corresponds to the “most preferred” stocks, while the fifth (P5) corresponds to the “least preferred” stocks. Suppose there are \(n\) stocks in total. Then the stocks fall into the first rank portfolio will have score \(p\), the stocks fall into the second rank portfolio will get score \(p-1\) and so on. Then we can get a score for every stock. We did the same calculation for each factor.
  • Second, we calculate a “Composite Factor Score” by combining the six-factor scores and using an equal weighting scheme. Then we get composite factor score for each stock.
  • Third, we then rank the stocks in our universe according to their Composite Factor Scores and choose the highest ranked 20 stocks to construct our portfolios at the beginning of each month.
  • At the end of each month, we repeat the above steps to construct the new portfolio and adjust the holding stocks.


Reference

  1. Factor Based Stock Selection Model for Turkish Equities, 2015, Ayhan Yüksel Online Copy

Author