Abstract

Several studies have found that press releases and other media can impact the perspective of investors. In this tutorial, we implement an intraday strategy to capitalize on the upward drift in the stock prices of drug manufacturers following positive news releases. Our findings show that when combining the effect with the day-of-the-week anomaly documented by Berument & Kiymaz (2001), there is enough directional accuracy for the trading system to remain profitable throughout the 2020 stock market crash. However, the algorithm underperforms the S&P 500 market index ETF, SPY, over the same time period. The algorithm we design here is inspired by the work of Isah, Shah, & Zulkernine (2018).

Background

The use of alternative data sets to forecast stock prices has increased in recent years as the fundamental and technical analysis spaces increase in competition. Utilizing Natural Language Processing (NLP) techniques to analyze the sentiment of news releases and other text related to publicly traded companies has caught the interest of many quant researchers. Such online information is frequently released and can be interpreted in a virtually unlimited number of ways, leading to a novel approach to determining the "societal mood" (Isah et al, 2018, p. 2) towards a company.

There are several ways to implement a NLP system. In this tutorial, we utilize a dictionary to quantify the sentiment of news releases. The dictionary provided herein was sourced from Isah et al (2018), where it's use achieved a 70% accuracy when targeting several hand-picked stocks in India's pharmaceutical industry.

Method

Universe Selection

We implement a universe selection model that provides the trading system with companies classified by MorningStar as being in the drug manufacturing industry group. We narrow our universe to include only the drug manufacturers with the greatest PE ratios and dollar volume.

def select_coarse(self, algorithm, coarse):
    has_fundamentals = [c for c in coarse if c.has_fundamental_data]
    sorted_by_dollar_volume = sorted(has_fundamentals, key=lambda c: c.dollar_volume, reverse=True)
    return [ x.symbol for x in sorted_by_dollar_volume[:self.coarse_size] ]

def select_fine(self, algorithm, fine):
    drug_manufacturers = [f for f in fine if f.asset_classification.morningstar_industry_group_code == MorningstarIndustryGroupCode.DRUG_MANUFACTURERS]
    sorted_by_pe = sorted(drug_manufacturers, key=lambda f: f.valuation_ratios.pe_ratio, reverse=True)
    return [ x.symbol for x in sorted_by_pe[:self.fine_size] ]

Alpha Construction

The DrugNewsSentimentAlphaModel emits insights to take long intraday positions for securities that have positive news sentiment. During construction of the model, we:

  • Create a dictionary to store SymbolData for each symbol
  • Gather the sentiment dictionary provided by Isah et al (2018)
  • Determine the maximum number of grams we need to analyze news articles
  • Define a method to determine the sign of sentiment
  • Specify the value of bars_before_insight

The bars_before_insight parameter determines how many bars the Alpha model should observe after the market opens before emitting insights. Isah et al (2018) batch the news released by each company into 30-minute intervals before analyzing the sentiment of the batch. In this tutorial, we follow a similar procedure by setting bars_before_insight to 30.

class DrugNewsSentimentAlphaModel(AlphaModel):
    symbol_data_by_symbol = {}
    sentiment_by_phrase = SentimentByPhrase.dictionary
    max_phrase_words = max([len(phrase.split()) for phrase in sentiment_by_phrase.keys()])
    sign = lambda _, x: int(x and (1, -1)[x < 0])

    def __init__(self, bars_before_insight=30):
        self.bars_before_insight = bars_before_insight

Alpha Securities Management

When a new security is added to the universe, we create a SymbolData object for it to store information unique to each security. The management of the SymbolData objects occurs in the Alpha model's OnSecuritiesChanged method.

def on_securities_changed(self, algorithm, changes):
    for security in changes.added_securities:
        self.symbol_data_by_symbol[security.symbol] = SymbolData(security, algorithm)

    for security in changes.removed_securities:
        symbol_data = self.symbol_data_by_symbol.pop(security.symbol, None)
        if symbol_data:
            algorithm.remove_security(symbol_data.tiingo_symbol)

The definition of the SymbolData class is shown below. We add properties to it to track the cumulative sentiment of news releases over time and the number of bars the Alpha model has received for each security since the market open. In the constructor, we save a reference to the security's exchange so we can access the market hours of the exchange when generating insights. This is also where we initialize the Tiingo news feed for each security.

class SymbolData:
    cumulative_sentiment = 0
    bars_seen_today = 0

    def __init__(self, security, algorithm):
        self.exchange = security.exchange
        self.tiingo_symbol = algorithm.add_data(TiingoNews, security.symbol).symbol

Alpha Update

As new TiingoNews objects are provided to the Alpha model's Update method, we update the cumulative sentiment for each security. The cumulative sentiment counter is reset at each market close. Therefore, when we emit insights 30-minutes after the open, we are considering the sentiment of the news articles released from the previous close to the current time. We employ the findings of Berument & Kiymaz (2001), restricting the Alpha model's trading to Wednesday, the most profitable day of the week. Positions are entered 30-minutes after the open and exited at the close.

def update(self, algorithm, data):
    insights = []

    for symbol, symbol_data in self.symbol_data_by_symbol.items():

        # If it's after-hours or within 30-minutes of the open, update
        # cumulative sentiment for each symbol    
        if symbol_data.bars_seen_today < self.bars_before_insight:
            tiingo_symbol = symbol_data.tiingo_symbol
            if data.contains_key(tiingo_symbol) and data[tiingo_symbol] is not None:
                article = data[tiingo_symbol]
                symbol_data.cumulative_sentiment += self.calculate_sentiment(article)

        if data.contains_key(symbol) and data[symbol] is not None:
            symbol_data.bars_seen_today += 1

            # 30-mintes after the open, emit insights in the direction of the cumulative sentiment.
            # Only emit insights on Wednesdays to capture the analomaly documented by Berument and 
            # Kiymaz (2001).
            if symbol_data.bars_seen_today == self.bars_before_insight and data.time.weekday() == 2:
                    next_close_time = symbol_data.exchange.hours.get_next_market_close(data.time, False)
                    direction = self.sign(symbol_data.cumulative_sentiment)
                    if direction == 0:
                        continue
                    insight = Insight.price(symbol, 
                                            next_close_time - timedelta(minutes=2),
                                            direction)
                    insights.append(insight)

            # At the close, reset the cumulative sentiment
            if not symbol_data.exchange.date_time_is_open(data.time):
                symbol_data.cumulative_sentiment = 0
                symbol_data.bars_seen_today = 0

    return insights

Calculating Sentiment

We define the following helper method to return the sentiment of a Tiingo news article by analyzing the article's title and description. The sentiment_by_phrase dictionary was retrieved from queensbamlab's NewsSentiment GitHub repository. Although we have adjusted the dictionary to lowercase and removed some redundancies, this is the same dictionary used by Isah et al (2018). "The dictionary was created by leveraging author's domain expertise and thorough analysis of news articles over the years" (p. 3).

def calculate_sentiment(self, article):
    sentiment = 0
    for content in (article.title, article.description):
        words = content.lower().split()
        for num_words in range(1, self.max_phrase_words + 1):
            for gram in ngrams(words, num_words):
                phrase = ' '.join(gram)
                if phrase in self.sentiment_by_phrase.keys():
                    sentiment += self.sentiment_by_phrase[phrase]
    return sentiment

Portfolio Construction & Trade Execution

We utilize the EqualWeightingPortfolioConstructionModel and the ImmediateExecutionModel.

Conclusion

We conclude that deploying the sentiment analysis strategy on the US drug manufacturing industry does not provide as accurate of results as found by Isah et al (2018). Only after restricting trading to the most profitable day of the week (Berument & Kiymaz, 2001) does the strategy achieve profitability over our testing period. Overall, the strategy produces a Sharpe ratio of -1.619, while the SPY benchmark produces a -0.579 Sharpe ratio during the same period. We attribute the decrease in performance to the commissions and spread costs simulated by LEAN.

To continue the development of this strategy, future areas of research include:

  • Adding a threshold parameter the cumulative sentiment counter must pass to signal trades
  • Only analyzing the sentiment of articles which contain keywords like "US, "USA", 'Q1', and others in their titles
  • Stemming and removing punctuation from articles before calculating their sentiment
  • Adding phrases to the sentiment dictionary or adjusting the sentiment values
  • Adding other datasets beside just the Tiingo News Feed


Reference

  1. Shah, Dev, Haruna Isah, and Farhana Zulkernine. “Predicting the Effects of News Sentiments on the Stock Market.” 2018 IEEE International Conference on Big Data (Big Data) (2018). Online copy
  2. Berument, Hakan and Kiymaz, Halil, The Day of the Week Effect on Stock Market Volatility (2001). Journal of Economics and Finance, Vol.25, No.2, pp. 181-193. Online copy