Historical Data
Alternative Data
Introduction
Alternative datasets provide signals to inform trading decisions. To view all the alternative datasets available on QuantConnect, see the Dataset Market. This page explains how to get historical data for alternative datasets.
Data Points
To get historical alternative data points, call the history
method with the dataset Symbol
.
This method returns a DataFrame that contains the data point attributes.
class AlternativeDataHistoryAlgorithm(QCAlgorithm): def initialize(self) -> None: self.set_start_date(2024, 12, 20) # Get the Symbol of a dataset. dataset_symbol = self.add_data(Fred, 'RVXCLS').symbol # Get the trailing 5 days of Fred data in DataFrame format. history = self.history(dataset_symbol, 5, Resolution.DAILY)
value | ||
---|---|---|
symbol | time | |
RVXCLS.Fred | 2024-12-17 | 23.02 |
2024-12-18 | 24.01 | |
2024-12-19 | 32.76 | |
2024-12-20 | 29.90 |
# Calculate the dataset's rate of change. roc = history.pct_change().iloc[1:]
value | ||
---|---|---|
symbol | time | |
RVXCLS.Fred | 2024-12-18 | 0.043006 |
2024-12-19 | 0.364431 | |
2024-12-20 | -0.087302 |
If you intend to use the data in the DataFrame to create alternativeDataClass
objects, request that the history request returns the data type you need.
Otherwise, LEAN consumes unnecessary computational resources populating the DataFrame.
To get a list of dataset objects instead of a DataFrame, call the history[alternativeDataClass]
method.
# Get the trailing 5 days of Fred data for an asset in Fred format. history = self.history[Fred](dataset_symbol, 5, Resolution.DAILY) # Iterate through the historical data points. for data_point in history: t = data_point.end_time value = data_point.value
Some alternative datasets provide multiple entries per asset per time step.
For example, the US Regulatory Alerts dataset can provide multiple alerts per day.
In this case, to organize the data into a DataFrame, set the flatten
argument to True
.
class RegalyticsHistoryAlgorithm(QCAlgorithm): def initialize(self) -> None: self.set_start_date(2024, 12, 20) # Get the all the Regalytics articles that were published over the last day, organized in a DataFrame. dataset_symbol = self.add_data(RegalyticsRegulatoryArticles, "REG").symbol history = self.history(dataset_symbol, 1, Resolution.DAILY, flatten=True)
agencies | alerttype | ... | time | title | ||
---|---|---|---|---|---|---|
time | symbol | |||||
2024-12-20 | REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Equifax, Inc. |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Experian Information Solutions, Inc. | |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Equifax, Inc. | |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Transunion Intermediate Holdings, Inc. | |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Transunion Intermediate Holdings, Inc. | |
... | ... | ... | ... | ... | ... | |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Equifax, Inc. | |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Equifax, Inc. | |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Equifax, Inc. | |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Equifax, Inc. | |
REG.RegalyticsRegulatoryArticles | [Consumer Financial Protection Bureau] | Complaint | ... | 2024-12-21 | Complaint Filed: Equifax, Inc. |
# Get all the unique alert types from the Regalytics articles. alert_types = history.alerttype.unique()
array(['Complaint', 'Press release', 'Event', 'Litigation Release', 'Grant Information', 'Media Release', 'News', 'Announcement', 'Transcript', 'Decree', 'Decision', 'Regulation', 'Executive Order', 'Media Advisory', 'Disaster Press Release', 'Notice', 'Procurement', 'Meeting', 'News release', 'Contract', 'Publication', 'Blog', 'Tabled Document', 'Resolution', 'Bill', 'Concurrent Resolution', 'Opinions and Adjudicatory Orders', 'Proposed rule', 'Technical Notice', 'Sanction', 'Order', 'Statement', 'Rule', 'enforcement action', 'Report', 'Statement|Release', 'AWCs (Letters of Acceptance, Waiver, and Consent)'], dtype=object)
Universes
To get historical data for an alternative data universe, call the history
method with the Universe
object.
Set the flatten
argument to True
to get a DataFrame that has columns for the data point attributes.
class AltDataUniverseHistoryAlgorithm(QCAlgorithm): def initialize(self) -> None: self.set_start_date(2024, 12, 23) # Add a universe of US Equities based on an alternative dataset. universe = self.add_universe(BrainStockRankingUniverse) # Get 5 days of history for the universe. history = self.history(universe, timedelta(5), flatten=True)
rank10days | rank21days | rank2days | rank3days | rank5days | value | ||
---|---|---|---|---|---|---|---|
time | symbol | ||||||
2024-12-18 | A RPTMYV3VC57P | -0.001895 | 0.005938 | -0.007858 | -0.006320 | 0.001771 | -0.007858 |
AAL VM9RIYHM8ACL | -0.003977 | 0.006520 | -0.005671 | -0.003738 | -0.004715 | -0.005671 | |
AAPL R735QTJ8XC9X | 0.027450 | 0.037339 | 0.006018 | 0.001489 | 0.010102 | 0.006018 | |
ABBV VCY032R250MD | -0.002814 | 0.012297 | -0.001717 | -0.000679 | -0.007454 | -0.001717 | |
ABNB XK8H247DY6W5 | 0.020533 | 0.046173 | -0.002303 | 0.007350 | 0.011252 | -0.002303 | |
... | ... | ... | ... | ... | ... | ... | ... |
2024-12-23 | ZI XF2DKG2HHLK5 | -0.038118 | -0.037527 | -0.035904 | -0.041195 | -0.041388 | -0.035904 |
ZM X3RPXTZRW09X | 0.006784 | 0.020690 | -0.005674 | -0.008556 | -0.002120 | -0.005674 | |
ZS WSVU0MELFQED | 0.010422 | 0.019619 | -0.003743 | -0.002079 | 0.002778 | -0.003743 | |
ZTO WF2L9EOCSQCL | -0.021455 | -0.015939 | -0.018053 | -0.020263 | -0.024979 | -0.018053 | |
ZTS VDRJHVQ4FNFP | 0.030589 | 0.041400 | 0.024744 | 0.019891 | 0.031048 | 0.024744 |
# Select the asset with the greatest value each day. daily_winner = history.groupby('time').apply(lambda x: x.nlargest(1, 'value')).reset_index(level=1, drop=True).value
time symbol 2024-12-18 FIC R735QTJ8XC9X 0.054204 2024-12-19 FIC R735QTJ8XC9X 0.073250 2024-12-20 FIC R735QTJ8XC9X 0.065142 2024-12-21 FIC R735QTJ8XC9X 0.065142 2024-12-22 FIC R735QTJ8XC9X 0.065142 2024-12-23 FIC R735QTJ8XC9X 0.065142 Name: value, dtype: float64
To get the data in the format of the objects that you receive in your universe filter function instead of a DataFrame, use flatten=False
.
# Get the historical universe data over the last 5 days in a Series where # the values in the series are lists of the universe selection objects. history = self.history(universe, timedelta(5), flatten=False) # Select the asset with the greatest value each day. for (universe_symbol, time), data in history.items(): leader = sorted(data, key=lambda x: x.value)[-1]
Slices
To request Slice
objects of historical data, call the history
method without providing any Symbol
objects.
It returns data for all the data subscriptions in your notebook, so the result may include more than just alternative data.
class SliceHistoryAlgorithm(QCAlgorithm): def initialize(self) -> None: self.set_start_date(2024, 12, 23) # Add an asset and an alternative dataset. symbol = self.add_crypto('BTCUSD', Resolution.DAILY, Market.BITFINEX).symbol dataset_symbol = self.add_data(BitcoinMetadata, symbol).symbol # Get the latest 3 data points of all the securities/datasets in the notebook, packaged into Slice objects. history = self.history(3) # Iterate through each Slice and get the synchronized data points at each moment in time. for slice_ in history: t = slice_.time if symbol in slice_: price = slice_[symbol].price if dataset_symbol in slice_: hash_rate = slice_[dataset_symbol].hash_rate
When your history request returns Slice
objects, the time
properties of these objects are based on the algorithm time zone, but the end_time
properties of the individual data objects are based on the data time zone.
The end_time
is the end of the sampling period and when the data is actually available.
Sparse Datasets
A sparse dataset is a dataset that doesn't have data for every time step of its resolution. For example, the US Energy Information Administration (EIA) datasets have a daily resolution but the data for the "U.S. Ending Stocks of Finished Motor Gasoline in Thousand Barrels (Mbbl)" series only updates once a week. So when you request the trailing 30 days of historical data for it, you only get a few data points.
class SparseDatasetHistoryAlgorithm(QCAlgorithm): def initialize(self) -> None: self.set_start_date(2024, 12, 23) # Add a sparse dataset. In this case, the default resolution is daily. symbol = self.add_data(USEnergy, 'PET.WGFSTUS1.W').symbol # Get 30 days of history for the dataset. history = self.history(symbol, 30)
value | ||
---|---|---|
symbol | time | |
PET.WGFSTUS1.W.USEnergy | 2024-11-29 | 14445.0 |
2024-12-06 | 16528.0 | |
2024-12-13 | 16452.0 | |
2024-12-20 | 16833.0 |
Most alternative datasets have only one resolution, which is usually daily. To check if a dataset is sparse and to view its resolution(s), see the documentation in the Dataset Market.