Abstract

In this tutorial, we build upon the natural language processing (NLP) approach from the previous strategy. In this iteration, we monitor the Tiingo News Feed and try to determine the intraday news sentiment of the largest constituents in the Nasdaq-100 index while avoiding look-ahead bias. The results show that this version of the strategy experienced lower risk-adjusted returns than the QQQ exchange-traded fund (ETF) over the last two years.

Background

NLP is a subfield of artificial intelligence that strives to process unstructured text and understand its meaning. In most NLP trading strategies, the developer provides a set of pre-selected phrases and their sentiment scores, which usually introduces look-ahead bias into the strategy. In this algorithm, we circumvent this error by assigning sentiment scores to words on-the-fly based on how they impact the future returns of the security.

Method

Let’s review how we can implement this strategy as a framework algorithm with the LEAN trading engine.

Universe Selection

To get the largest constituents of the QQQ ETF, we add a custom ETF Constituents Universe Selection model and define the filter function to provide the 10 securities with the largest weight in the QQQ ETF.

def ETFConstituentsFilter(self, constituents: List[ETFConstituentData]) -> List[Symbol]:
    selected = sorted([c for c in constituents if c.Weight],
        key=lambda c: c.Weight, reverse=True)[:10]
    return [c.Symbol for c in selected]

Requesting News Articles

Everytime a security enters our ETF universe, we subscribe to its Tiingo News Feed.

self.dataset_symbol = algorithm.AddData(TiingoNews, symbol).Symbol

Training the NLP Models

To ensure the algorithm is fit using the most recent news releases, we train a model for each security when they enter the universe and we schedule training sessions to re-fit the models at the beginning of every month.

algorithm.Train(algorithm.DateRules.MonthStart(), algorithm.TimeRules.At(7, 0), self.train_models)

During the training sessions, we use the following procedure for each security in the universe:

  1. Make a history request to gather the news releases and trading prices of the security over the last 30 days.
  2. Tokenize the news article text, drop the punctuation, and drop filler words like “the”, “a”, and “an”.
  3. Create a dictionary that maps each word to the expected future return of the security over the following 30 minutes.

Detecting Significant News

The NLP models transform the text of news releases into a prediction on the future returns of the respective security. Instead of trading in response to every news release, we only trade when an NLP model provides a prediction that’s \(n\) standard deviations away from the mean of the last 30 predictions. A larger value of \(n\) translates to fewer trades, but the trades are in response to news that carry more significance.

Emitting Insights

When an NLP model detects some significant news, the Alpha model emits an Insight with a duration of 30 minutes and a direction that matches the sentiment of the news release. That is, if the model determines the news release is positive, the insight has InsightDirection.Up. Otherwise, it has InsightDirection.Down.

direction = InsightDirection.Up if expected_return > 0 else InsightDirection.Down
insights.append(Insight.Price(asset_symbol, self.PREDICTION_INTERVAL, direction))

Portfolio Construction

The Tiingo News Feed provides news every second an article is released. In this strategy, the goal is to immediately trade in response to news articles and hold the position for 30 minutes. The position size should only change during the 30 minutes if another significant news article is released for the same security and it has sentiment in the opposite direction of our trade. To achieve this, we create a custom Portfolio Construction model (PCM), called the PartitionedPortfolioConstructionModel.

This PCM works by slicing the portfolio into \(p\) independent partitions. When the PCM receives an Insight for the first security, it allocates \(\frac{1}{p}\) of the portfolio capital to the security. The security price fluctuates over time, so its weight in the portfolio won’t stay fixed at \(\frac{1}{p}\) if \(p > 1\). When the model receives an Insight for another security, it calculates the number of vacant partitions \(v\) and then allocates \(\frac{1}{v}\) of the portfolio cash to the new security. The benefit of this design is that the PCM maintains the size of every trade until the Insight expires or the Alpha model emits a new insight in the other direction. The drawback of this design is that the portfolio can only hold up to \(p\) securities at any one time.

Results

We backtested the strategy from January 1, 2021 to January 1, 2023 and the algorithm achieved a -0.659 Sharpe ratio. To compare this performance, the following table shows the results of some benchmarks:

Benchmark Sharpe Ratio
Buy-and-hold with the QQQ -0.14
An equal-weighted portfolio of the same universe as the strategy -0.11

In conclusion, the strategy underperforms the two preceding benchmarks in terms of risk-adjusted returns and it needs further development before live trading.



Author