Popular Libraries
GPlearn
Get Historical Data
Get some historical market data to train and test the model. For example, to get data for the SPY ETF during 2020 and 2021, run:
qb = QuantBook() symbol = qb.add_equity("SPY", Resolution.DAILY).symbol history = qb.history(symbol, datetime(2020, 1, 1), datetime(2022, 1, 1)).loc[symbol]
Prepare Data
You need some historical data to prepare the data for the model. If you have historical data, manipulate it to train and test the model. In this example, use the following features and labels:
Data Category | Description |
---|---|
Features | Daily percent change of the open, high, low, close, and volume of the SPY over the last 5 days |
Labels | Daily percent return of the SPY over the next day |
The following image shows the time difference between the features and labels:
Follow these steps to prepare the data:
- Call the
pct_change
method and then drop the first row. - Loop through the
daily_returns
DataFrame and collect the features and labels. - Convert the lists of features and labels into
numpy
arrays. - Split the data into training and testing periods.
daily_returns = history['close'].pct_change()[1:]
n_steps = 5 features = [] labels = [] for i in range(len(daily_returns)-n_steps): features.append(daily_returns.iloc[i:i+n_steps].values) labels.append(daily_returns.iloc[i+n_steps])
X = np.array(features) y = np.array(labels)
X_train, X_test, y_train, y_test = train_test_split(X, y)
Train Models
You need to prepare the historical data for training before you train the model. If you have prepared the data, build and train the model. In this example, create a Symbolic Transformer to generate new non-linear features and then build a Symbolic Regressor model. Follow these steps to create the model:
- Declare a set of functions to use for feature engineering.
- Call the
SymbolicTransformer
constructor with the preceding set of functions. - Call the
fit
method with the training features and labels. - Call the
transform
method with the original features. - Call the
hstack
method with the original features and the transformed features. - Call the
SymbolicRegressor
constructor. - Call the
fit
method with the engineered features and the original labels.
function_set = ['add', 'sub', 'mul', 'div', 'sqrt', 'log', 'abs', 'neg', 'inv', 'max', 'min']
gp_transformer = SymbolicTransformer(function_set=function_set, random_state=0, verbose=1)
gp_transformer.fit(X_train, y_train)
This method displays the following output:
gp_features_train = gp_transformer.transform(X_train)
new_X_train = np.hstack((X_train, gp_features_train))
gp_regressor = SymbolicRegressor(random_state=0, verbose=1)
gp_regressor.fit(new_X_train, y_train)
Test Models
You need to build and train the model before you test its performance. If you have trained the model, test it on the out-of-sample data. Follow these steps to test the model:
- Feature engineer the testing set data.
- Call the
predict
method with the engineered testing set data. - Plot the actual and predicted labels of the testing period.
- Calculate the R-square value.
gp_features_test = gp_transformer.transform(X_test) new_X_test = np.hstack((X_test, gp_features_test))
y_predict = gp_regressor.predict(new_X_test)
df = pd.DataFrame({'Real': y_test.flatten(), 'Predicted': y_predict.flatten()}) df.plot(title='Model Performance: predicted vs actual closing price', figsize=(15, 10)) plt.show()
r2 = gp_regressor.score(new_X_test, y_test) print(f"The explained variance of the GP model: {r2*100:.2f}%")
Store Models
You can save and load GPlearn
models using the Object Store.
Save Models
Follow these steps to save models in the Object Store:
- Set the key names of the models to be stored in the Object Store.
- Call the
GetFilePath
get_file_path
method with the key names. - Call the
dump
method with the models and file paths.
transformer_key = "transformer" regressor_key = "regressor"
transformer_file = qb.object_store.get_file_path(transformer_key) regressor_file = qb.object_store.get_file_path(regressor_key)
This method returns the file paths where the models will be stored.
joblib.dump(gp_transformer, transformer_file) joblib.dump(gp_regressor, regressor_file)
If you dump the model using the joblib
module before you save the model, you don't need to retrain the model.
Load Models
You must save a model into the Object Store before you can load it from the Object Store. If you saved a model, follow these steps to load it:
- Call the
ContainsKey
contains_key
method. - Call the
GetFilePath
get_file_path
method with the keys. - Call the
load
method with the file paths.
qb.object_store.contains_key(transformer_key) qb.object_store.contains_key(regressor_key)
This method returns a boolean that represents if the model_key
is in the Object Store. If the Object Store does not contain the model_key
, save the model using the model_key
before you proceed.
transformer_file = qb.object_store.get_file_path(transformer_key) regressor_file = qb.object_store.get_file_path(regressor_key)
This method returns the path where the model is stored.
loaded_transformer = joblib.load(transformer_file) loaded_regressor = joblib.load(regressor_file)
This method returns the saved models.