Datasets
Custom Data
Define Custom Data
You must format the data file into chronological order before you define the custom data class.
To define a custom data class, extend the PythonData
class and override the GetSource and Reader methods.
class Nifty(PythonData): '''NIFTY Custom Data Class''' def get_source(self, config: SubscriptionDataConfig, date: datetime, is_live_mode: bool) -> SubscriptionDataSource: url = "http://cdn.quantconnect.com.s3.us-east-1.amazonaws.com/uploads/CNXNIFTY.csv" return SubscriptionDataSource(url, SubscriptionTransportMedium.REMOTE_FILE) def reader(self, config: SubscriptionDataConfig, line: str, date: datetime, is_live_mode: bool) -> BaseData: if not (line.strip() and line[0].isdigit()): return None # New Nifty object index = Nifty() index.symbol = config.symbol try: # Example File Format: # Date, Open High Low Close Volume Turnover # 2011-09-13 7792.9 7799.9 7722.65 7748.7 116534670 6107.78 data = line.split(',') index.time = datetime.strptime(data[0], "%Y-%m-%d") index.end_time = index.time + timedelta(days=1) index.value = data[4] index["Open"] = float(data[1]) index["High"] = float(data[2]) index["Low"] = float(data[3]) index["Close"] = float(data[4]) except: pass return index
Create Subscriptions
You need to define a custom data class before you can subscribe to it.
Follow these steps to subscribe to custom dataset:
- Create a
QuantBook
. - Call the
add_data
method with a ticker and then save a reference to the dataSymbol
.
qb = QuantBook()
symbol = qb.add_data(Nifty, "NIFTY").symbol
Custom data has its own resolution, so you don't need to specify it.
Get Historical Data
You need a subscription before you can request historical data for a security. You can request an amount of historical data based on a trailing number of bars, a trailing period of time, or a defined period of time.
Before you request data, call set_start_date
method with a datetime
to reduce the risk of look-ahead bias.
qb.set_start_date(2014, 7, 29)
If you call the set_start_date
method, the date that you pass to the method is the latest date for which your history requests will return data.
Trailing Number of Bars
Call the history
method with a symbol, integer, and resolution to request historical data based on the given number of trailing bars and resolution.
history = qb.history(symbol, 10)
This method returns the most recent bars, excluding periods of time when the exchange was closed.
Trailing Period of Time
Call the history
method with a symbol, timedelta
, and resolution to request historical data based on the given trailing period of time and resolution.
history = qb.history(symbol, timedelta(days=10))
This method returns the most recent bars, excluding periods of time when the exchange was closed.
Defined Period of Time
Call the history
method with a symbol, start datetime
, end datetime
, and resolution to request historical data based on the defined period of time and resolution. The start and end times you provide are based in the notebook time zone.
start_time = datetime(2013, 7, 29) end_time = datetime(2014, 7, 29) history = qb.history(symbol, start_time, end_time)
This method returns the bars that are timestamped within the defined period of time.
In all of the cases above, the history
method returns a DataFrame
with a MultiIndex
.
![Dataframe of custom NIFTY data from history call](https://cdn.quantconnect.com/i/tu/research-history-custom-full.png)
Download Method
To download the data directly from the remote file location instead of using your custom data class, call the download
method with the data URL.
content = qb.download("http://cdn.quantconnect.com.s3.us-east-1.amazonaws.com/uploads/CNXNIFTY.csv")
Follow these steps to convert the content to a DataFrame
:
- Import the
StringIO
from theio
library. - Create a
StringIO
. - Call the
read_csv
method.
from io import StringIO
data = StringIO(content)
dataframe = pd.read_csv(data, index_col=0)
![Dataframe of custom NIFTY data from io read csv](https://cdn.quantconnect.com/i/tu/research-history-custom-download.png)
Wrangle Data
You need some historical data to perform wrangling operations. To display pandas
objects, run a cell in a notebook with the pandas
object as the last line. To display other data formats, call the print
method.
The DataFrame
that the history
method returns has the following index levels:
- Dataset
Symbol
- The
end_time
of the data sample
The columns of the DataFrame
are the data properties.
![Dataframe of custom NIFTY data](https://cdn.quantconnect.com/i/tu/custom-research-data-1.jpg)
To select the data of a single dataset, index the loc
property of the DataFrame
with the data Symbol
.
history.loc[symbol]
![Dataframe of custom NIFTY data with symbol selected](https://cdn.quantconnect.com/i/tu/custom-research-data-2.jpg)
To select a column of the DataFrame
, index it with the column name.
history.loc[symbol]['close']
![Close price series of custom NIFTY data](https://cdn.quantconnect.com/i/tu/custom-research-data-3.jpg)
Plot Data
You need some historical custom data to produce plots. You can use many of the supported plotting libraries to visualize data in various formats. For example, you can plot candlestick and line charts.
Candlestick Chart
Follow these steps to plot candlestick charts:
- Get some historical data.
- Import the
plotly
library. - Create a
Candlestick
. - Create a
Layout
. - Create a
Figure
. - Show the
Figure
.
history = qb.history(Nifty, datetime(2013, 7, 1), datetime(2014, 7, 31)).loc[symbol]
import plotly.graph_objects as go
candlestick = go.Candlestick(x=history.index, open=history['open'], high=history['high'], low=history['low'], close=history['close'])
layout = go.Layout(title=go.layout.Title(text=f'{symbol} OHLC'), xaxis_title='Date', yaxis_title='Price', xaxis_rangeslider_visible=False)
fig = go.Figure(data=[candlestick], layout=layout)
fig.show()
Candlestick charts display the open, high, low, and close prices of the security.
![Candlestick plot of custom NIFTY OHLC](https://cdn.quantconnect.com/i/tu/research-history-custom-candlestick.png)
Line Chart
Follow these steps to plot line charts using built-in methods :
- Select data to plot.
- Call the
plot
method on thepandas
object. - Show the plot.
values = history['value'].unstack(level=0)
values.plot(title="Value", figsize=(15, 10))
plt.show()
Line charts display the value of the property you selected in a time series.
![Line chart of custom NIFTY value](https://cdn.quantconnect.com/i/tu/research-history-custom-lineplot.png)