Algorithm research methodology

Hello,

I'm starting to use the research environment to test investment ideas and, while the tool is clear, I am not sure what would be a correct process to implement and test the validity of an idea.

For instance in the notebook code below I test a (very) simple SKLearn Regressor to predict the stock price change in the next minute given the price changes of the last 60. This is what I did:

Create the history dataset from 10 SP500 randomly picked stocks (survivorship bias to be fixed)
Calculate the price change for the stocks and prepare the features and target
Train a simple MLPRegressor model
Check actual and predicted results via score and also a scatter plot to see visually if they correlate

Is this a correct way to proceed to validate an algorithm? In this case should I just move on to a new algo given that this has negative score?Thank you in advance for any tip you may share!
Francesco

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split

qb = QuantBook()
symbols = ["KO", "SYK", "SYF", "ILMN", "NBL", "CAH", "ISRG", "FCX", "LVS", "TFC"]
for s in symbols:
    qb.AddEquity(s)
lookback = 600
datapoints = 1000
history = qb.History(qb.Securities.Keys, datapoints+lookback+2, Resolution.Minute)
close = history["close"].unstack("time")
returns = (close/close.shift(1, axis=1)-1)
features, target = None, None
for i in range(datapoints):
    data = returns.iloc[:,i:i+lookback+1].dropna().values
    features = data[:, :-1] if features is None else np.vstack((features, data[:, :-1]))
    target = data[:, -1:] if target is None else np.vstack((target, data[:, -1:]))
print(f"Features {len(features)}")
print(f"Target {len(target)}")

test_samples = int(len(features)*0.2)
x, x_test = features[:-test_samples], features[-test_samples:]
y, y_test = target[:-test_samples], target[-test_samples:]
model = MLPRegressor(hidden_layer_sizes=(1024, 1024), max_iter=1000)
print(f"Train points: {len(x)}\tTest points {len(x_test)}")
model.fit(x, y)
score = model.score(x_test, y_test)
print(f"Score {score:.3f}")

y_pred = model.predict(x_test)
plt.scatter(y_pred, y_test)
plt.title('Actual vs Predicted Return')
plt.xlabel("Actual Return")
plt.ylabel("Predicted Return")
plt.grid()

https://drive.google.com/uc?export=download&id=1T4hejKQ85K-2OuG1FePJBOTvoS1Uwnnv

Hi Francesco, I recommend checkout out our "research to production" series where we demonstrate that translation from notebooks to algorithms. If you do the research a specific way you can actually copy and paste the code you need to an algorithm you're working on.

This is the closest example to what you're doing with another feature of SKLearn.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

Hi Jared,

Thanks for the prompt reply. Sorry my question is actually one step behind moving an algorithm from research to production. My doubt is more on the process, measures and checks to determine if an idea or hypothesis is "valuable" and deserves to be implemented in an algorithm.

What do you measure in a notebook (e.g. model score?) and how do you decide if the idea should be implemented in algorithm?

Sorry if my original question was not clear. Hope this clarifies what I was looking for.

Thank you,
Francesco

Jared Broad INVESTOR

QuantConnect | April 2020

Upvote

1 person upvoted this

Francesco Baldisserri INVESTOR

BeawAI | April 2020

Rahul Chowdhury INVESTOR

Rahul Research | April 2020

Hi Francesco,

The goal in research is to test our model's predictions against the actual data. We can use different metrics to determine how accurate a model's prediction is. One method is to use an error function, which calculates "how wrong" a model is. Then we can compare models with respect to same error function as a measure of progress in our predictions. There is no one way to decide whether a model is ready for implementation. Models which are very inaccurate have no merit for implementation, while models which are too accurate are at risk of overfitting. Generally, if a model has a reasonable foundation in finance and also makes accurate predictions, then that model is a good candidate for implementation.

Best
Rahul

Hi Rahul,

Thank you very much for your response. I'll keep tinkering with my notebooks and see what may be backtested.

Thanks,
Francesco

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

365,200 Quants.

VOTE FOR UPCOMING FEATURES

JOIN OUR Community MAILING LIST

IN THIS RESEARCH

PARTICIPANTS

Actions

Join QuantConnect for Free