Hello,
I'm starting to use the research environment to test investment ideas and, while the tool is clear, I am not sure what would be a correct process to implement and test the validity of an idea.
For instance in the notebook code below I test a (very) simple SKLearn Regressor to predict the stock price change in the next minute given the price changes of the last 60. This is what I did:
- Create the history dataset from 10 SP500 randomly picked stocks (survivorship bias to be fixed)
- Calculate the price change for the stocks and prepare the features and target
- Train a simple MLPRegressor model
- Check actual and predicted results via score and also a scatter plot to see visually if they correlate
Is this a correct way to proceed to validate an algorithm? In this case should I just move on to a new algo given that this has negative score?Thank you in advance for any tip you may share!
Francesco
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
qb = QuantBook()
symbols = ["KO", "SYK", "SYF", "ILMN", "NBL", "CAH", "ISRG", "FCX", "LVS", "TFC"]
for s in symbols:
qb.AddEquity(s)
lookback = 600
datapoints = 1000
history = qb.History(qb.Securities.Keys, datapoints+lookback+2, Resolution.Minute)
close = history["close"].unstack("time")
returns = (close/close.shift(1, axis=1)-1)
features, target = None, None
for i in range(datapoints):
data = returns.iloc[:,i:i+lookback+1].dropna().values
features = data[:, :-1] if features is None else np.vstack((features, data[:, :-1]))
target = data[:, -1:] if target is None else np.vstack((target, data[:, -1:]))
print(f"Features {len(features)}")
print(f"Target {len(target)}")
test_samples = int(len(features)*0.2)
x, x_test = features[:-test_samples], features[-test_samples:]
y, y_test = target[:-test_samples], target[-test_samples:]
model = MLPRegressor(hidden_layer_sizes=(1024, 1024), max_iter=1000)
print(f"Train points: {len(x)}\tTest points {len(x_test)}")
model.fit(x, y)
score = model.score(x_test, y_test)
print(f"Score {score:.3f}")
y_pred = model.predict(x_test)
plt.scatter(y_pred, y_test)
plt.title('Actual vs Predicted Return')
plt.xlabel("Actual Return")
plt.ylabel("Predicted Return")
plt.grid()
https://drive.google.com/uc?export=download&id=1T4hejKQ85K-2OuG1FePJBOTvoS1Uwnnv
Jared Broad
Hi Francesco, I recommend checkout out our "research to production" series where we demonstrate that translation from notebooks to algorithms. If you do the research a specific way you can actually copy and paste the code you need to an algorithm you're working on.
This is the closest example to what you're doing with another feature of SKLearn.
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Francesco Baldisserri
Hi Jared,
Thanks for the prompt reply. Sorry my question is actually one step behind moving an algorithm from research to production. My doubt is more on the process, measures and checks to determine if an idea or hypothesis is "valuable" and deserves to be implemented in an algorithm.
What do you measure in a notebook (e.g. model score?) and how do you decide if the idea should be implemented in algorithm?
Sorry if my original question was not clear. Hope this clarifies what I was looking for.
Thank you,
Francesco
Rahul Chowdhury
Hi Francesco,
The goal in research is to test our model's predictions against the actual data. We can use different metrics to determine how accurate a model's prediction is. One method is to use an error function, which calculates "how wrong" a model is. Then we can compare models with respect to same error function as a measure of progress in our predictions. There is no one way to decide whether a model is ready for implementation. Models which are very inaccurate have no merit for implementation, while models which are too accurate are at risk of overfitting. Generally, if a model has a reasonable foundation in finance and also makes accurate predictions, then that model is a good candidate for implementation.
Best
Rahul
Francesco Baldisserri
Hi Rahul,
Thank you very much for your response. I'll keep tinkering with my notebooks and see what may be backtested.
Thanks,
Francesco
Francesco Baldisserri
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!