Importing Data
Bulk Downloads
Introduction
There are two techniques to import data into your algorithm. You can either manually import the entire file or stream the file line-by-line into your algorithm's OnData
on_data
event. This page explores importing an entire file for manual use.
Instead of downloading the file from a remote file provider, you can upload the file to the Object Store (with the Algorithm Lab or with the CLI) for faster execution.
Recommended Use Cases
The batch import technique is outside of the LEAN's awareness or control, so it can't enforce good practices. However, the batch import technique is good for the loading the following datasets:
- Loading data into the Object Store
- Trained AI Models
- Well-defined historical price datasets
- Parameters and setting imports such as
Symbol
lists
Download Files
The Download
download
method downloads the content served from a local file or URL and then returns it as a string.
Basic Usage
var file = Download("<filePathOrURL>");
file = self.download("<filePathOrURL>") # If your file is in CSV format, convert it to a DataFrame with the `read_csv` method. from io import StringIO import pandas as pd df = pd.read_csv(StringIO(file)) # If your file is in JSON format, parse it with the `loads` method. import json data = json.loads(file) # If your file is in XML format, parse it with the `fromstring` method. import xml.etree.ElementTree as ET root = ET.fromstring(file)
Download Method Arguments
The Download
download
method can accept header settings, a username, and a password for authentication.
Argument | Data Type | Description | Default Value |
---|---|---|---|
address | string str | A string containing the URI to download | |
headers |
IEnumerable<KeyValuePair<string, string>>
Dict[str,str] | Defines header values to add to the request | Enumerable.Empty<KeyValuePair<string, string>>() dict() |
userName user_name | string str | The user name associated with the credentials | null None |
password | string str | The password for the user name associated with the credentials | null None |
Download Request Headers
var headers = new Dictionary{ { "1", "1" } }; Download(address, headers); Download(address, headers, userName, password);
header = { "1": "1" } self.download(address, headers) self.download(address, headers, user_name, password)
Transport Binary Data
Follow these steps to transport binary files:
- Add the following imports to your local program:
- Serialize your object.
- Save the string representation of your object to one of the supported sources.
- Download the remote file into your project.
- Restore the object.
import pickle import base64
pickle_bytes = pickle.dumps(my_object) base64_str = base64.b64encode(pickle_bytes).decode('ascii')
base64_str = self.download("<fileURL>")
base64_bytes = base64_str.encode('ascii') model = base64.b64decode(base64_bytes) restored_model = pickle.loads(model)
Examples
The following examples demonstrate common practices for bulk downloading data.
Example 1: Download Machine Learning Model
The following algorithm makes use of a
Scikit-Learn
machine learning model to predict SPY price changes and place orders according to the prediction. To obtain the model, we either retrieve it from the
Object Store
if there exists any or download it from a Dropbox link using the
download
method.
from sklearn.svm import SVR from sklearn.model_selection import GridSearchCV import joblib class BulkDownloadExampleAlgorithm(QCAlgorithm): def initialize(self) -> None: self.set_start_date(2021, 1, 1) self.set_start_date(2021, 2, 1) self.set_cash(100000) # Request SPY data for model training, prediction, and trading. self.symbol = self.add_equity("SPY", Resolution.DAILY).symbol # 2-year data to train the model. training_length = 252*2 self.training_data = RollingWindow[TradeBar](training_length) # Warm up the training dataset to train the model immediately. history = self.history[TradeBar](self.symbol, training_length, Resolution.DAILY) for trade_bar in history: self.training_data.add(trade_bar) # Retrieve the already trained model from the object store for immediate use. if self.object_store.contains_key("sklearn_model"): file = self.object_store.get_file_path("sklearn_model") # Otherwise, bulk-download the model from an external source (Dropbox in this example). else: file = self.download("https://www.dropbox.com/scl/fi/nhz2zxq3pr2bweia4av0o/sklearn_model?rlkey=loy09wbh69k9j6umlru9icsaj&st=6vdazyp4&dl=1") self.model = joblib.load(file) # Train the model to use the prediction right away. self.train(self.my_training_method) # Recalibrate the model weekly to ensure its accuracy on the updated domain. self.train(self.date_rules.every(DayOfWeek.SUNDAY), self.time_rules.at(8,0), self.my_training_method) def get_features_and_labels(self, n_steps=5) -> None: # Train and predict the return data, which is more normalized and stationary. training_df = self.pandas_converter.get_data_frame[TradeBar](list(self.training_data)[::-1]) daily_pct_change = training_df.pct_change().dropna() # Stack the data for 5-day OHLCV data per each sample to train with. features = [] labels = [] for i in range(len(daily_pct_change)-n_steps): features.append(daily_pct_change.iloc[i:i+n_steps].values.flatten()) labels.append(daily_pct_change['close'].iloc[i+n_steps]) features = np.array(features) labels = np.array(labels) return features, labels def my_training_method(self) -> None: # Prepare the processed training data. features, labels = self.get_features_and_labels() # Recalibrate the model based on updated data. if isinstance(self.model, GridSearchCV): self.model = self.model.fit(features, labels).best_estimator_ else: self.model = self.model.fit(features, labels) def on_data(self, slice: Slice) -> None: if self.symbol in slice.bars: self.training_data.add(slice.bars[self.symbol]) # Get predictions by the updated features. features, _ = self.get_features_and_labels() prediction = self.model.predict(features[-1].reshape(1, -1)) prediction = float(prediction) # If the predicted direction is going upward, buy SPY. if prediction > 0: self.set_holdings(self.symbol, 1) # If the predicted direction is going downward, sell SPY. elif prediction < 0: self.set_holdings(self.symbol, -1) def on_end_of_algorithm(self) -> None: # Store the model in the object store to retrieve it in other instances if the algorithm stops. model_key = "sklearn_model" file_name = self.object_store.get_file_path(model_key) joblib.dump(self.model, file_name)
Other Examples
For more examples, see the following algorithms: