book
Checkout our new book! Hands on AI Trading with Python, QuantConnect, and AWS Learn More arrow

Historical Data

Custom Data

Introduction

You can import external datasets into your algorithm to use alongside other datasets from the Dataset Market. This page explains how to get historical data for custom datasets. Before you can get historical data for the dataset, define the get_source and reader methods of the custom data class. For examples of custom dataset implementations, see Key Concepts.

Slices

To request Slice objects of historical data, call the history method without providing any Symbol objects. It returns data for all the data subscriptions in your notebook, so the result may include more than just custom data.

Select Language:
# Get the latest 3 data points of all the securities/datasets in the notebook, packaged into Slice objects.
history = self.history(3)

When your history request returns Slice objects, the time properties of these objects are based on the algorithm time zone, but the end_time properties of the individual data objects are based on the data time zone. The end_time is the end of the sampling period and when the data is actually available.

Data Points

To get historical data points for a custom dataset, call the history method with the dataset Symbol. This method returns a DataFrame that contains the data point attributes of the dataset class. For an example definition of a custom data class, see the CSV Format Example.

Select Language:
class CustomSecurityHistoryAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        self.set_start_date(2014, 7, 10)
        # Add a custom dataset and save a reference to it's Symbol.
        dataset_symbol = self.add_data(MyCustomDataType, "MyCustomDataType", Resolution.DAILY).symbol
        # Get the trailing 5 days of MyCustomDataType data in DataFrame format.
        history = self.history(dataset_symbol, 5, Resolution.DAILY)
closehighlowopenvalue
symboltime
MYCUSTOMDATATYPE.MyCustomDataType2014-07-087787.157792.007755.107780.407787.15
2014-07-097623.207808.857595.907804.057623.20
2014-07-107585.007650.107551.657637.957585.00

If you intend to use the data in the DataFrame to create customDatasetClass objects, request that the history request returns the data type you need. Otherwise, LEAN consumes unnecessary computational resources populating the DataFrame. To get a list of dataset objects instead of a DataFrame, call the history[customDatasetClass] method.

# Get the trailing 5 days of MyCustomDataType data for an asset in MyCustomDataType format. 
history = self.history[MyCustomDataType](dataset_symbol, 5, Resolution.DAILY)

If the dataset provides multiple entries per time step, in the get_source method of your custom data class, return a SubscriptionDataSource that uses FileFormat.UNFOLDING_COLLECTION. To get the historical data of this custom data type in a DataFrame, set the flatten argument to True.

history = self.history(dataset_symbol, 1, Resolution.DAILY, flatten=True)

Universes

To get historical data for a custom data universe, call the history method with the Universe object. For an example definition of a custom data universe class, see the CSV Format Example.

Select Language:
class CustomDataUniverseHistoryAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        self.set_start_date(2017, 7, 9)
        # Add a universe from a custom data source and save a reference to it.
        universe = self.add_universe(
            StockDataSource, "my-stock-data-source", Resolution.DAILY, lambda data: [x.symbol for x in data]
        )
        # Get the historical universe data over the last 5 days in DataFrame format.
        history = self.history(universe, timedelta(5))
symbols
time
2017-07-05[SPY, QQQ, FB, AAPL, IWM]
2017-07-06[SPY, QQQ, FB, AAPL, IWM]
2017-07-07[QQQ, AAPL, IWM, FB, GOOGL]
2017-07-08[IWM, AAPL, FB, BAC, GOOGL]
2017-07-09[AAPL, FB, GOOGL, GOOG, BAC]
# Count the number of assets in the universe each day.
universe_size_by_day = history.apply(lambda row: len(row['symbols']), axis=1)
time
2017-07-05    5
2017-07-06    5
2017-07-07    5
2017-07-08    5
2017-07-09    5
Name: symbols, dtype: int64

Missing Data Points

History requests for a trailing number of data samples return data based on the market hours of assets. The default market hours for custom securities is to be always open. Therefore, history requests for a trailing number of data samples may return fewer samples than you expect. To set the market hours of the dataset, see Market Hours.

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: