Machine Learning model forecasting from Jupyter to production

Hi, I still don't understand if it's possible to develop a machine learning strategy in researchmode, with a non OOP Python pipeline (Sklearn, pandas ecc.), save the model (model.predict(X_test)) and then load it in the Main strategy environment for scheduled training e backtest/live trading purposes.For example this is a random forest basic strategy pipeline in Jupyter and historical/live FXCM data, i'm actually not interestedin results, it's just a test for an end to end implementation:def strategy(forex,time,start_years): pair = forex period = time end = dt.datetime.now() years = timedelta(days=365) start = end - (years*start_years) df = con.get_candles(pair, period=period, start=start, end=end) df = df.drop(['bidopen','bidhigh','bidlow','askopen','askclose','askhigh','asklow','tickqty'], axis=1) df["bidclose"] = df["bidclose"].pct_change() df = df.dropna() df = df.rename(columns = {'bidclose': 'returns'}) df["y"] = df.iloc[:, 0].shift(-1).fillna(method='ffill') X = df.iloc[:, 0].values.reshape(-1,1) y = df.iloc[:, 1] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=0) regressor = RandomForestRegressor(n_estimators=50, random_state=0) model = regressor.fit(X_train, y_train) y_pred = regressor.predict(X_test) print('Mean Absolute Error:', metrics.mean_absolute_error(y_test, y_pred)) print('Mean Squared Error:', metrics.mean_squared_error(y_test, y_pred)) print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test, y_pred return strategy('NZD/USD','H1',1)I'm stuck on this point: regressor.predict(X_test)
From there to a fully functional strategy ready for backtesting and live trading...which are the fundamental passages? I copy/paste this code in research environment adapting data acquisition, then?

Thank you very much

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

Federico

I have done some work with python machine learning modules and would be very interested to see the answer to your question. It would be helpful if one of the Quantconnect staff could create a small template to show how all the pieces would work in the Quantconnect environment.

Frank

Are you looking to save the predictions only or just the trained model? If the latter, look up `ObjectStore` in the docs.

i.e.

## Research

regressor = # trained sklearn model

# Save to ObjectStore
import pickle
serialized_regressor = pickle.dumps(regressor)
qb.ObjectStore.SaveBytes('MyModel', serialized_regressor)


## Algorithm

def Initialize(self):
    self.model = pickle.loads(bytes(self.ObjectStore.ReadBytes('MyModel')))

def OnData(self, data):
    self.model.predict( # )

If you want the Research notebook to be automatically run (e.g. redeployment if algo crashes), I don't think that's possible at the moment. If the ML model is not that computationally expensive (i.e. fits within the ML training time limits), you can just train it directly in main.py instead.

Frank Giardina

2.8k ,

Adam W

3.9k ,

Note that you can also do the same thing (serialize to bytes, save to ObjectStore, load it in main.py) to any other components in the ML modeling process - for instance, feature scalers like `sklearn.preprocessing.StandardScaler`.

For Deep Learning or anything that's too computationally expensive to do directly in main.py, I typically:

Train in Research on some period of time prior to backtest start (i.e. 2013-2015)
Serialize and manually save the network weights/biases/architecture/etc (whatever is required to "reconstruct" the trained model)
Load it in main.py, then backtest as normal (2015-current)

The most tedious part will typically be making sure the data shapes, etc are consistent between the algorithm and Research. For certain ML models like RandomForest this is pretty trivial, but for ones like recurrent nets or convolutional nets the data pipeline takes a bit more work to set up properly (i.e. saving the data one at a time until we get the right shape, handling timesteps with no data, etc).

Federico Juvara

199 ,

Thanks for the explanation. Take this process as an example (this is a KNN but for now doesn't matter):

https://github.com/PacktPublishing/Machine-Learning-for-Algorithmic-Trading-Second-Edition/blob/master/06_machine_learning_process/01_machine_learning_workflow.ipynb

Let's just say that i want to copy paste this entire pipeline in research mode, except for getting data methods. Time series + fundamental data using Quantconnect and implemented API. Then data wrangling, preprocessing, feature selection, train-test split, cross validation, model selection, training and testing, tuning, forecast (on test data) and scores. Ok? Now we have a fitted model after a complete programming all in plain Python with classic libraries like Sklearn, Pandas and Seaborn.

Let's just pretend i want to export this model in main Quantconnect strategy to obtain a complete backtest and eventually trade with real time data. I say this because i'm still confused about OOP programming and an entire programming process diving through classes, self, methods and whatever else would be really cumbersome. I feel much more comfortable with functional programming.

That being said, owning a ready model all i would need is a portfolio/risk management block and obviously all basic parameters (cash etc.). Am i right? Well, my question was just about the steps between the final model in research lab and the strategy in main.py that loads that model. Anyway it seems that many doubts have been removed thanks to your enlightening answer.

May i ask you a final, more detailed explanation given these additional informations? I'm a confused dummy and every suggestion could be essential. I'm focused on RF, Ensemble methods and SVM.

Well, you could think of the entire process in Research Lab in this case as learning the function f: X -> Y. The algorithm however in main.py is concerned with how to actually use f to make trades.

How you decide to use f is really up to you - technically all you need is a way to generate predictions and a way to "execute"/convert those predictions into trades.

The most naive way of implementing the trading logic here might look something like:

class MyAlgo(QCAlgorithm):

    def Initialize(self):
        self.model = pickle.loads(bytes(self.ObjectStore.ReadBytes('MyModel')))

        self.SetCash(1000)
        self.AddForex('NZDUSD', Resolution.Minute)  # Subscribe to NZD/USD data stream

        # Set BrokerageModel, etc

    def OnData(self, data):
        """
        Make some predictions every time data is streamed in
        """
        # Current NZDUSD data
        NZDUSD = data['NZDUSD']
        
        # Think of the X here as a single sample from X_test in your Research script
        X = np.array([NZDUSD.Open, NZDUSD.High, NZDUSD.Low, NZDUSD.Close])

        # Make some predictions (I assume this is the next period's close)
        Y_hat = self.model.predict(X)

        # Go long 100% of your portfolio value
        if Y_hat > NZDUSD.Close:
            self.SetHoldings('NZDUSD', 1.0)
            return

        # Go short 100% of your portfolio value
        if Y_hat < NZDUSD.Close:
            self.SetHoldings('NZDUSD', -1.0)
            return

        return

But maybe after running this, you think "instead of allocating 100% of my portfolio to Long/Short, maybe I should allocate x% based on the difference between the predictions and current prices". Or maybe you have N assets and it becomes too tedious to manually compute optimal weights and trading logic for every asset, etc.

That's where the QC classes come in handy. Instead of doing SetHoldings, you could emit Insights, do the weight computations in a PortfolioModel, manage risks with a RiskModel, and finally actually make trades with an ExecutionModel.

This level of abstraction seems like overkill (and can be confusing), but the point is that each module is concerned with a single task. This means that if something doesn't work, you can easily switch out a PortfolioModel for a different one. You can absolutely ignore all this and handle the trading logic manually, but soon the code becomes cluttered with a bunch of if statements and tends to be prone to bugs.

As for OOP, just think of it as functional programming but with internal states. I guess you could think of `self` as the "global" scope/namespace, and methods as the "local" scope/namespace in FP. If you avoid side-effects in the function calls (i.e. instead of updating self, pass the variable as an argument to another function), you can just work in a functional manner.

Thank you very much for the valuable contribution. Now it makes sense

Federico Juvara INVESTOR

Update Backtest

Notebook

person upvoted this people upvoted this

To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

338,300 Quants.

VOTE FOR UPCOMING FEATURES

Machine Learning model forecasting from Jupyter to production

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free

Platform

SIGN IN

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

338,300 Quants.

VOTE FOR UPCOMING FEATURES

Machine Learning model forecasting from Jupyter to production

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free