Historical Data
Custom Data
Introduction
You can import external datasets into your algorithm to use alongside other datasets from the Dataset Market.
This page explains how to get historical data for custom datasets.
Before you can get historical data for the dataset, define the get_source
and reader
methods of the custom data class.
For examples of custom dataset implementations, see Key Concepts.
Slices
To request Slice
objects of historical data, call the history
method without providing any Symbol
objects.
It returns data for all the data subscriptions in your notebook, so the result may include more than just custom data.
# Get the latest 3 data points of all the securities/datasets in the notebook, packaged into Slice objects. history = self.history(3)
When your history request returns Slice
objects, the time
properties of these objects are based on the algorithm time zone, but the end_time
properties of the individual data objects are based on the data time zone.
The end_time
is the end of the sampling period and when the data is actually available.
Data Points
To get historical data points for a custom dataset, call the history
method with the dataset Symbol
.
This method returns a DataFrame that contains the data point attributes of the dataset class.
For an example definition of a custom data class, see the CSV Format Example.
class CustomSecurityHistoryAlgorithm(QCAlgorithm): def initialize(self) -> None: self.set_start_date(2014, 7, 10) # Add a custom dataset and save a reference to it's Symbol. dataset_symbol = self.add_data(MyCustomDataType, "MyCustomDataType", Resolution.DAILY).symbol # Get the trailing 5 days of MyCustomDataType data in DataFrame format. history = self.history(dataset_symbol, 5, Resolution.DAILY)
close | high | low | open | value | ||
---|---|---|---|---|---|---|
symbol | time | |||||
MYCUSTOMDATATYPE.MyCustomDataType | 2014-07-08 | 7787.15 | 7792.00 | 7755.10 | 7780.40 | 7787.15 |
2014-07-09 | 7623.20 | 7808.85 | 7595.90 | 7804.05 | 7623.20 | |
2014-07-10 | 7585.00 | 7650.10 | 7551.65 | 7637.95 | 7585.00 |
If you intend to use the data in the DataFrame to create customDatasetClass
objects, request that the history request returns the data type you need.
Otherwise, LEAN consumes unnecessary computational resources populating the DataFrame.
To get a list of dataset objects instead of a DataFrame, call the history[customDatasetClass]
method.
# Get the trailing 5 days of MyCustomDataType data for an asset in MyCustomDataType format. history = self.history[MyCustomDataType](dataset_symbol, 5, Resolution.DAILY)
If the dataset provides multiple entries per time step, in the get_source
method of your custom data class, return a SubscriptionDataSource
that uses FileFormat.UNFOLDING_COLLECTION
.
To get the historical data of this custom data type in a DataFrame, set the flatten
argument to True
.
history = self.history(dataset_symbol, 1, Resolution.DAILY, flatten=True)
Universes
To get historical data for a custom data universe, call the history
method with the Universe
object.
For an example definition of a custom data universe class, see the CSV Format Example.
class CustomDataUniverseHistoryAlgorithm(QCAlgorithm): def initialize(self) -> None: self.set_start_date(2017, 7, 9) # Add a universe from a custom data source and save a reference to it. universe = self.add_universe( StockDataSource, "my-stock-data-source", Resolution.DAILY, lambda data: [x.symbol for x in data] ) # Get the historical universe data over the last 5 days in DataFrame format. history = self.history(universe, timedelta(5))
symbols | |
---|---|
time | |
2017-07-05 | [SPY, QQQ, FB, AAPL, IWM] |
2017-07-06 | [SPY, QQQ, FB, AAPL, IWM] |
2017-07-07 | [QQQ, AAPL, IWM, FB, GOOGL] |
2017-07-08 | [IWM, AAPL, FB, BAC, GOOGL] |
2017-07-09 | [AAPL, FB, GOOGL, GOOG, BAC] |
# Count the number of assets in the universe each day. universe_size_by_day = history.apply(lambda row: len(row['symbols']), axis=1)
time 2017-07-05 5 2017-07-06 5 2017-07-07 5 2017-07-08 5 2017-07-09 5 Name: symbols, dtype: int64
Missing Data Points
History requests for a trailing number of data samples return data based on the market hours of assets. The default market hours for custom securities is to be always open. Therefore, history requests for a trailing number of data samples may return fewer samples than you expect. To set the market hours of the dataset, see Market Hours.