book
Checkout our new book! Hands on AI Trading with Python, QuantConnect, and AWS Learn More arrow

Historical Data

Alternative Data

Introduction

Alternative datasets provide signals to inform trading decisions. To view all the alternative datasets available on QuantConnect, see the Dataset Market. This page explains how to get historical data for alternative datasets.

Data Points

To get historical alternative data points, call the history method with the dataset Symbol. This method returns a DataFrame that contains the data point attributes.

Select Language:
class AlternativeDataHistoryAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        self.set_start_date(2024, 12, 20)
        # Get the Symbol of a dataset.
        dataset_symbol = self.add_data(Fred, 'RVXCLS').symbol
        # Get the trailing 5 days of Fred data in DataFrame format.
        history = self.history(dataset_symbol, 5, Resolution.DAILY)
value
symboltime
RVXCLS.Fred2024-12-1723.02
2024-12-1824.01
2024-12-1932.76
2024-12-2029.90
# Calculate the dataset's rate of change.
roc = history.pct_change().iloc[1:]
value
symboltime
RVXCLS.Fred2024-12-180.043006
2024-12-190.364431
2024-12-20-0.087302

If you intend to use the data in the DataFrame to create alternativeDataClass objects, request that the history request returns the data type you need. Otherwise, LEAN consumes unnecessary computational resources populating the DataFrame. To get a list of dataset objects instead of a DataFrame, call the history[alternativeDataClass] method.

# Get the trailing 5 days of Fred data for an asset in Fred format. 
history = self.history[Fred](dataset_symbol, 5, Resolution.DAILY)
# Iterate through the historical data points.
for data_point in history:
    t = data_point.end_time
    value = data_point.value

Some alternative datasets provide multiple entries per asset per time step. For example, the US Regulatory Alerts dataset can provide multiple alerts per day. In this case, to organize the data into a DataFrame, set the flatten argument to True.

Select Language:
class RegalyticsHistoryAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        self.set_start_date(2024, 12, 20)      
        # Get the all the Regalytics articles that were published over the last day, organized in a DataFrame.
        dataset_symbol = self.add_data(RegalyticsRegulatoryArticles, "REG").symbol
        history = self.history(dataset_symbol, 1, Resolution.DAILY, flatten=True)
agenciesalerttype...timetitle
timesymbol
2024-12-20REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Equifax, Inc.
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Experian Information Solutions, Inc.
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Equifax, Inc.
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Transunion Intermediate Holdings, Inc.
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Transunion Intermediate Holdings, Inc.
..................
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Equifax, Inc.
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Equifax, Inc.
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Equifax, Inc.
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Equifax, Inc.
REG.RegalyticsRegulatoryArticles[Consumer Financial Protection Bureau]Complaint...2024-12-21Complaint Filed: Equifax, Inc.
# Get all the unique alert types from the Regalytics articles.
alert_types = history.alerttype.unique()
array(['Complaint', 'Press release', 'Event', 'Litigation Release',
       'Grant Information', 'Media Release', 'News', 'Announcement',
       'Transcript', 'Decree', 'Decision', 'Regulation',
       'Executive Order', 'Media Advisory', 'Disaster Press Release',
       'Notice', 'Procurement', 'Meeting', 'News release', 'Contract',
       'Publication', 'Blog', 'Tabled Document', 'Resolution', 'Bill',
       'Concurrent Resolution', 'Opinions and Adjudicatory Orders',
       'Proposed rule', 'Technical Notice', 'Sanction', 'Order',
       'Statement', 'Rule', 'enforcement action', 'Report',
       'Statement|Release',
       'AWCs (Letters of Acceptance, Waiver, and Consent)'], dtype=object)

Universes

To get historical data for an alternative data universe, call the history method with the Universe object. Set the flatten argument to True to get a DataFrame that has columns for the data point attributes.

Select Language:
class AltDataUniverseHistoryAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        self.set_start_date(2024, 12, 23)    
        # Add a universe of US Equities based on an alternative dataset.
        universe = self.add_universe(BrainStockRankingUniverse)
        # Get 5 days of history for the universe.
        history = self.history(universe, timedelta(5), flatten=True)
rank10daysrank21daysrank2daysrank3daysrank5daysvalue
timesymbol
2024-12-18A RPTMYV3VC57P-0.0018950.005938-0.007858-0.0063200.001771-0.007858
AAL VM9RIYHM8ACL-0.0039770.006520-0.005671-0.003738-0.004715-0.005671
AAPL R735QTJ8XC9X0.0274500.0373390.0060180.0014890.0101020.006018
ABBV VCY032R250MD-0.0028140.012297-0.001717-0.000679-0.007454-0.001717
ABNB XK8H247DY6W50.0205330.046173-0.0023030.0073500.011252-0.002303
........................
2024-12-23ZI XF2DKG2HHLK5-0.038118-0.037527-0.035904-0.041195-0.041388-0.035904
ZM X3RPXTZRW09X0.0067840.020690-0.005674-0.008556-0.002120-0.005674
ZS WSVU0MELFQED0.0104220.019619-0.003743-0.0020790.002778-0.003743
ZTO WF2L9EOCSQCL-0.021455-0.015939-0.018053-0.020263-0.024979-0.018053
ZTS VDRJHVQ4FNFP0.0305890.0414000.0247440.0198910.0310480.024744
# Select the asset with the greatest value each day.
daily_winner = history.groupby('time').apply(lambda x: x.nlargest(1, 'value')).reset_index(level=1, drop=True).value
time        symbol          
2024-12-18  FIC R735QTJ8XC9X    0.054204
2024-12-19  FIC R735QTJ8XC9X    0.073250
2024-12-20  FIC R735QTJ8XC9X    0.065142
2024-12-21  FIC R735QTJ8XC9X    0.065142
2024-12-22  FIC R735QTJ8XC9X    0.065142
2024-12-23  FIC R735QTJ8XC9X    0.065142
Name: value, dtype: float64

To get the data in the format of the objects that you receive in your universe filter function instead of a DataFrame, use flatten=False.

# Get the historical universe data over the last 5 days in a Series where
# the values in the series are lists of the universe selection objects.
history = self.history(universe, timedelta(5), flatten=False)
# Select the asset with the greatest value each day.
for (universe_symbol, time), data in history.items():
    leader = sorted(data, key=lambda x: x.value)[-1]

Slices

To request Slice objects of historical data, call the history method without providing any Symbol objects. It returns data for all the data subscriptions in your notebook, so the result may include more than just alternative data.

Select Language:
class SliceHistoryAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        self.set_start_date(2024, 12, 23)  
        # Add an asset and an alternative dataset.
        symbol = self.add_crypto('BTCUSD', Resolution.DAILY, Market.BITFINEX).symbol
        dataset_symbol = self.add_data(BitcoinMetadata, symbol).symbol
        # Get the latest 3 data points of all the securities/datasets in the notebook, packaged into Slice objects.
        history = self.history(3)
        # Iterate through each Slice and get the synchronized data points at each moment in time.
        for slice_ in history:
            t = slice_.time
            if symbol in slice_:
                price = slice_[symbol].price
            if dataset_symbol in slice_:
                hash_rate = slice_[dataset_symbol].hash_rate

When your history request returns Slice objects, the time properties of these objects are based on the algorithm time zone, but the end_time properties of the individual data objects are based on the data time zone. The end_time is the end of the sampling period and when the data is actually available.

Sparse Datasets

A sparse dataset is a dataset that doesn't have data for every time step of its resolution. For example, the US Energy Information Administration (EIA) datasets have a daily resolution but the data for the "U.S. Ending Stocks of Finished Motor Gasoline in Thousand Barrels (Mbbl)" series only updates once a week. So when you request the trailing 30 days of historical data for it, you only get a few data points.

Select Language:
class SparseDatasetHistoryAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        self.set_start_date(2024, 12, 23)      
        # Add a sparse dataset. In this case, the default resolution is daily.
        symbol = self.add_data(USEnergy, 'PET.WGFSTUS1.W').symbol
        # Get 30 days of history for the dataset.
        history = self.history(symbol, 30)
value
symboltime
PET.WGFSTUS1.W.USEnergy2024-11-2914445.0
2024-12-0616528.0
2024-12-1316452.0
2024-12-2016833.0

Most alternative datasets have only one resolution, which is usually daily. To check if a dataset is sparse and to view its resolution(s), see the documentation in the Dataset Market.

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: