Abstract
Several studies have found that press releases and other media can impact the perspective of investors. In this tutorial, we implement an intraday strategy to capitalize on the upward drift in the stock prices of drug manufacturers following positive news releases. Our findings show that when combining the effect with the day-of-the-week anomaly documented by Berument & Kiymaz (2001), there is enough directional accuracy for the trading system to remain profitable throughout the 2020 stock market crash. However, the algorithm underperforms the S&P 500 market index ETF, SPY, over the same time period. The algorithm we design here is inspired by the work of Isah, Shah, & Zulkernine (2018).
Background
The use of alternative data sets to forecast stock prices has increased in recent years as the fundamental and technical analysis spaces increase in competition. Utilizing Natural Language Processing (NLP) techniques to analyze the sentiment of news releases and other text related to publicly traded companies has caught the interest of many quant researchers. Such online information is frequently released and can be interpreted in a virtually unlimited number of ways, leading to a novel approach to determining the "societal mood" (Isah et al, 2018, p. 2) towards a company.
There are several ways to implement a NLP system. In this tutorial, we utilize a dictionary to quantify the sentiment of news releases. The dictionary provided herein was sourced from Isah et al (2018), where it's use achieved a 70% accuracy when targeting several hand-picked stocks in India's pharmaceutical industry.
Method
Universe Selection
We implement a universe selection model that provides the trading system with companies classified by MorningStar as being in the drug manufacturing industry group. We narrow our universe to include only the drug manufacturers with the greatest PE ratios and dollar volume.
def select_coarse(self, algorithm, coarse):
has_fundamentals = [c for c in coarse if c.has_fundamental_data]
sorted_by_dollar_volume = sorted(has_fundamentals, key=lambda c: c.dollar_volume, reverse=True)
return [ x.symbol for x in sorted_by_dollar_volume[:self.coarse_size] ]
def select_fine(self, algorithm, fine):
drug_manufacturers = [f for f in fine if f.asset_classification.morningstar_industry_group_code == MorningstarIndustryGroupCode.DRUG_MANUFACTURERS]
sorted_by_pe = sorted(drug_manufacturers, key=lambda f: f.valuation_ratios.pe_ratio, reverse=True)
return [ x.symbol for x in sorted_by_pe[:self.fine_size] ]
Alpha Construction
The DrugNewsSentimentAlphaModel
emits insights to take long intraday positions for securities that have positive
news sentiment. During construction of the model, we:
- Create a dictionary to store
SymbolData
for each symbol - Gather the sentiment dictionary provided by Isah et al (2018)
- Determine the maximum number of grams we need to analyze news articles
- Define a method to determine the sign of sentiment
- Specify the value of
bars_before_insight
The bars_before_insight
parameter determines how many bars the Alpha model should observe after the market opens
before emitting insights. Isah et al (2018) batch the news released by each company into 30-minute intervals
before analyzing the sentiment of the batch. In this tutorial, we follow a similar procedure by setting
bars_before_insight
to 30.
class DrugNewsSentimentAlphaModel(AlphaModel):
symbol_data_by_symbol = {}
sentiment_by_phrase = SentimentByPhrase.dictionary
max_phrase_words = max([len(phrase.split()) for phrase in sentiment_by_phrase.keys()])
sign = lambda _, x: int(x and (1, -1)[x < 0])
def __init__(self, bars_before_insight=30):
self.bars_before_insight = bars_before_insight
Alpha Securities Management
When a new security is added to the universe, we create a SymbolData
object for it to store information unique to
each security. The management of the SymbolData
objects occurs in the Alpha model's OnSecuritiesChanged method.
def on_securities_changed(self, algorithm, changes):
for security in changes.added_securities:
self.symbol_data_by_symbol[security.symbol] = SymbolData(security, algorithm)
for security in changes.removed_securities:
symbol_data = self.symbol_data_by_symbol.pop(security.symbol, None)
if symbol_data:
algorithm.remove_security(symbol_data.tiingo_symbol)
The definition of the SymbolData
class is shown below. We add properties to it to track the cumulative sentiment
of news releases over time and the number of bars the Alpha model has received for each security since the market
open. In the constructor, we save a reference to the security's exchange so we can access the market hours of the
exchange when generating insights. This is also where we initialize the
Tiingo news feed for each security.
class SymbolData:
cumulative_sentiment = 0
bars_seen_today = 0
def __init__(self, security, algorithm):
self.exchange = security.exchange
self.tiingo_symbol = algorithm.add_data(TiingoNews, security.symbol).symbol
Alpha Update
As new TiingoNews objects are provided to the Alpha model's Update method, we update the cumulative sentiment for each security. The cumulative sentiment counter is reset at each market close. Therefore, when we emit insights 30-minutes after the open, we are considering the sentiment of the news articles released from the previous close to the current time. We employ the findings of Berument & Kiymaz (2001), restricting the Alpha model's trading to Wednesday, the most profitable day of the week. Positions are entered 30-minutes after the open and exited at the close.
def update(self, algorithm, data):
insights = []
for symbol, symbol_data in self.symbol_data_by_symbol.items():
# If it's after-hours or within 30-minutes of the open, update
# cumulative sentiment for each symbol
if symbol_data.bars_seen_today < self.bars_before_insight:
tiingo_symbol = symbol_data.tiingo_symbol
if data.contains_key(tiingo_symbol) and data[tiingo_symbol] is not None:
article = data[tiingo_symbol]
symbol_data.cumulative_sentiment += self.calculate_sentiment(article)
if data.contains_key(symbol) and data[symbol] is not None:
symbol_data.bars_seen_today += 1
# 30-mintes after the open, emit insights in the direction of the cumulative sentiment.
# Only emit insights on Wednesdays to capture the analomaly documented by Berument and
# Kiymaz (2001).
if symbol_data.bars_seen_today == self.bars_before_insight and data.time.weekday() == 2:
next_close_time = symbol_data.exchange.hours.get_next_market_close(data.time, False)
direction = self.sign(symbol_data.cumulative_sentiment)
if direction == 0:
continue
insight = Insight.price(symbol,
next_close_time - timedelta(minutes=2),
direction)
insights.append(insight)
# At the close, reset the cumulative sentiment
if not symbol_data.exchange.date_time_is_open(data.time):
symbol_data.cumulative_sentiment = 0
symbol_data.bars_seen_today = 0
return insights
Calculating Sentiment
We define the following helper method to return the sentiment of a Tiingo news article by analyzing the article's
title and description. The sentiment_by_phrase
dictionary was retrieved from queensbamlab's
NewsSentiment GitHub repository. Although we have
adjusted the dictionary to lowercase and removed some redundancies, this is the same dictionary used by Isah et
al (2018). "The dictionary was created by leveraging author's domain expertise and thorough analysis of news
articles over the years" (p. 3).
def calculate_sentiment(self, article):
sentiment = 0
for content in (article.title, article.description):
words = content.lower().split()
for num_words in range(1, self.max_phrase_words + 1):
for gram in ngrams(words, num_words):
phrase = ' '.join(gram)
if phrase in self.sentiment_by_phrase.keys():
sentiment += self.sentiment_by_phrase[phrase]
return sentiment
Portfolio Construction & Trade Execution
We utilize the EqualWeightingPortfolioConstructionModel and the ImmediateExecutionModel.
Conclusion
We conclude that deploying the sentiment analysis strategy on the US drug manufacturing industry does not provide as accurate of results as found by Isah et al (2018). Only after restricting trading to the most profitable day of the week (Berument & Kiymaz, 2001) does the strategy achieve profitability over our testing period. Overall, the strategy produces a Sharpe ratio of -1.619, while the SPY benchmark produces a -0.579 Sharpe ratio during the same period. We attribute the decrease in performance to the commissions and spread costs simulated by LEAN.
To continue the development of this strategy, future areas of research include:
- Adding a threshold parameter the cumulative sentiment counter must pass to signal trades
- Only analyzing the sentiment of articles which contain keywords like "US, "USA", 'Q1', and others in their titles
- Stemming and removing punctuation from articles before calculating their sentiment
- Adding phrases to the sentiment dictionary or adjusting the sentiment values
- Adding other datasets beside just the Tiingo News Feed
Reference
- Shah, Dev, Haruna Isah, and Farhana Zulkernine. “Predicting the Effects of News Sentiments on the Stock Market.” 2018 IEEE International Conference on Big Data (Big Data) (2018). Online copy
- Berument, Hakan and Kiymaz, Halil, The Day of the Week Effect on Stock Market Volatility (2001). Journal of Economics and Finance, Vol.25, No.2, pp. 181-193. Online copy
Derek Melchin
See the attached backtest for an updated version of the algorithm in PEP8 style.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Derek Melchin
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!