Popular Models

FinBERT

Introduction

This page explains how to use FinBERT in LEAN trading algorithms. The model repository provides the following description:

FinBERT is a pre-trained NLP model to analyze sentiment of financial text. It is built by further training the BERT language model in the finance domain, using a large financial corpus and thereby fine-tuning it for financial sentiment classification. Financial PhraseBank by Malo et al. (2014) is used for fine-tuning. For more details, please see the paper FinBERT: Financial Sentiment Analysis with Pre-trained Language Models and our related blog post on Medium.

The model will give softmax outputs for three labels: positive, negative or neutral.

Use Cases

The FinBERT model is a sentiment analysis model. The following use cases explain how you might utilize it in trading algorithms:

  • Analyze the sentiment of the latest news articles for specific companies, then form a long-short portfolio with assets that have the most postive and most negative news sentiment.
  • Monitor the sentiment of regulatory alerts in a risk management model and liquidate holdings when sentiment is extremely negative.
  • Generate sentences based on information from other datasets and then feed them into the model to determine the sentiment. For example, you could use the US Government Contracts dataset to create the string "The Department of State grants AAPL a contract for the purchase of mobile phones".

Load Pre-Trained Model

Follow these steps to load the pre-trained FinBERT model:

  1. Import the model and tokenizer classes.
  2. from transformers import TFBertForSequenceClassification, BertTokenizer
  3. Define the path were the model is stored.
  4. In QuantConnect Cloud, the path is ProsusAI / finbert.

    model_path = "ProsusAI/finbert"
  5. Create a TFBertForSequenceClassification model.
  6. self._model = TFBertForSequenceClassification.from_pretrained(model_path, local_files_only=True)
  7. Create a BertTokenizer object.
  8. self._tokenizer = BertTokenizer.from_pretrained(model_path, local_files_only=True)
  9. (Optional) Set the seed to enable reproducibility.
  10. from transformers import set_seed
    set_seed(1, True)

Fine-Tune Model

The FinBERT model is pre-trained, so you don't need to fine-tune it. Fine-tuning the model just tailors it to your specific use case. Follow these steps to fine-tune it:

  1. Import the Dataset class.
  2. from datasets import Dataset
  3. Load the pre-trained model.
  4. Compile the model with an optimizer and loss function.
  5. model.compile(
        optimizer=tf.keras.optimizers.Adam(learning_rate=3e-5), 
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True)
    )

    You only need to compile the model once. TensorFlow offers many other optimizer and loss classes you can use.

  6. Create a DataFrame that contains your training samples.
  7. The DataFrame should have two columns, named "text" and "label". The label must be an integer that represents the sentiment class. By default, the FinBERT model has three classes. Class zero represents negative sentiment, class one represents neutral sentiment, and class two represents positive sentiment.

    samples = pd.DataFrame(columns=['text', 'label'])
    # Add rows to the DataFrame...
  8. Convert the samples DataFrame to a tf.data.Dataset object.
  9. dataset = Dataset.from_pandas(samples)
  10. Tokenize the text in each training sample.
  11. dataset = dataset.map(
        lambda sample: self._tokenizer(sample['text'], padding='max_length', truncation=True)
    )
  12. Call the model's prepare_tf_dataset method.
  13. dataset = model.prepare_tf_dataset(dataset, shuffle=True, tokenizer=self._tokenizer)
  14. Call the model's fit method.
  15. model.fit(dataset, epochs=2)

Analyze Sentiment

Follow these steps to analyze the sentiment of some text with FinBERT:

  1. Load the model.
  2. (Optional) Fine-tune the model.
  3. Get the text you want the model to analyze.
  4. The model can analyze a single sentence or a list of sentences.

    content = "AAPL stock price spikes after record-breaking sales figures."
  5. Tokenize the text(s).
  6. inputs = self._tokenizer(content, padding=True, truncation=True, return_tensors='tf')

    For more information about how to tokenize, see the PreTrainedTokenizer.__call__ reference on the Hugging Face website.

  7. Perform dictionary unpacking on the preceding result and pass it to the model as input.
  8. outputs = self._model(**inputs)
  9. Apply softmax to the outputs to get probabilities of each sentiment class.
  10. scores = tf.nn.softmax(outputs.logits, axis=-1).numpy()

    The result of the preceding operation is a two dimensional numpy array. Each element in the two dimensional array is a list that contains the probability that the sentiment of the corresponding sentence is negative, neutral, or postiive, respectively. For example, you may get the following result if you use a single sentence as input. The result shows that the input is more positive than negative, but is likely neutral.

    array([[0.21346861, 0.46771246, 0.318819]])

Examples

The following algorithm selects a volatile asset at the beginning of each month. It gets the Tiingo News articles that were released for the asset over the previous 10 days and then feeds them into the pre-trained FinBERT model. It then aggregates the sentiment scores of all the news releases. If the aggregated sentiment score is positive, it enters a long position for the month. If it's negative, it enters a long position for the month.


The following algorithm selects a volatile asset at the beginning of each month. It gets the Tiingo News articles that were released for the asset over the previous 30 days to generate the training set. The label is the market return that occurs from the current news release to the next news release. The algorithm then fine-tunes the model, calculates the sentiment, and rebalances the portfolio.

These algorithms require a GPU node.

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: