ExtraTreesClassifier Algorithm - Help me improve it

Hi Everyone,

About two years ago I spent quite a bit of time learning sklearn and QC. I invested quite a bit of time into this algorithm before finally giving up on ML for other methods. That said, it was very enjoyable to create and I think there is some code in here that could be very useful to others.

Some thoughts
1. We use ExtraTreesClassifiers for the model. In my experience they provided the best results.
2. The models are very sensitive to the random seed. So we create 10 models per symbols with different random seeds and score the models.
3. We use walk forward validation for the testing.
4. I struggled mightily with the scoring of the models. I tried many methods (you'll see some big blocks of commented code)
5. It's also very sensitive to the features used.
6. There is a very useful features helper class. It allows you to easily append features by submitting a list such as ['EMA_7','EMA_7_28','EMA_7_28_diff']. These are the EMA_7, the ratio of EMA_7 to EMA_28, and the diff of the previous and current value of the ratio of EMA_7 and EMA_28. You can use almost an indicator in TA Lib with it.
7. The code is somewhat messy and I see some places that need to be refactored.
8. The current trading framework is very basic. We train the models every 3 months and we predict the direction of the equities for the coming week.

I do not think ML is the best way to build an algo, but I do think it's the most fun! I would love to be proved wrong. If anyone has ideas for improvements I'm glad to make those changes and see what we can do.

One change I see that needs to be made is that the signal is only trained on positives so we really should not be short selling based on the signal, rather using it just as an in or out signal.

Another change is having it predict daily rather than weekly. I made those changes in this backtest.

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

Hey Cole S!

Great to see you here dropping gems, even if they are over my head 🙈

I'm bookmarking and cloning this for later / someday, when I'm a bit more ML savvy.

Thanks for sharing!

Cole S INVESTOR

kctrader | January 2022

Upvote

.ekz. INVESTOR

ThePrinters.io | January 2022

Fred Painchaud INVESTOR

January 2022

HI Cole,

“I do not think ML is the best way to build an algo, but I do think it's the most fun!” - I'm currently having a lot of fun building my indies without ML 😊!

More seriously, I do not have a lot of personal practical experience with ML but I'm in an environment when it is used a lot. And I studied it so I can understand what's going on around me 😊.

Just a question so I don't have to dive into the code and all that. When people use ML to train models, do they train their model on one particular asset and then trade that asset with that model? Because, the first question that come to my mind when I think about ML for trading is “isn't that the pinnacle of overfitting?”.

And my second question which comes very fast right after is “are bots/people trading with ML aware of ”adversarial ML" techniques"? 😊

I'm asking you simply because it looks like you may know those trends re AI/ML-based trading…

Fred

Hey Fred,

Thanks for all the questions!

Regarding training on one symbol or not, I found that it's much better performance training on a single symbol rather than a grouping. It's very hard to group symbols into “like price action” and individual symbols tend to have distinct price action movements, especially over the shorter term. I actually spent much more time grouping them and separating the symbols was an “Aha!" moment because it definitely performed better that way.

I have know knowledge of adversarial techniques. If I should be so successful I have to be concerned about it I would be happy!

And thank you for the answers Cole.

Lars Klawitter INVESTOR

January 2023

hi Cole,

many thanks for sharing.

Having recently subscribed to the MLFinlab framework

I thought that it might be interesting to introduce Meta Labeling (training a secondary model on the returns of the primary one and using its predictions as confirmation on whether or not to take the trade).

has anyone tried this?

Lauren Roberts INVESTOR

June 2024

Yup, it doesn't work. Check out the issues on the GitHub. I'm not sure what the current plan is to integrate, if at all.

Most of their stuff is available on YouTube and GitHub.

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

365,200 Quants.

VOTE FOR UPCOMING FEATURES

JOIN OUR Community MAILING LIST

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free