Hey folks,
I wrote a pretty genius 398 lines of code that backtests about 550% gains YOY - although it would perform significanltly different in forward/live tests, because data is delayed by a day or two, I'm currently running it on IBKR test funds and it seems to be net net gains, but that's been <1wk. Either way, I'd like to bring it here to work Alpha magic and earn some income while I don't have funds to invest,
It hinges on the ability to pull data from a static site's 'screener' function, along with csrfmiddlewaretoken.. but the idea dies when I try to build the Python here as I can't from bs4 import BeautifulSoup (errors out). Is there a way to pull different libraries into QC?
If that's not possible, I can write a middle app that pulls the current / backtest info by date into CSVs. I can't figure out how to combine these CSVs (could learn?) so there's a CSV by date, I can upload these to DropBox but I can't seem to find out how to list / download all via DropBox API unless I use their SDK which I again can't import to QC.
I'm also worried the free tier backtester instances wouldn't be enough to run the backtest, whcih has data from about halfway through 2018 til now.. as it'd involve many, many equities.
Thanks in advance for your help,
Jarett Dunn
I did combine my CSV, but I can't download from Google Drive Dropbox or GitHub?URL = "https://github.com/DunnCreativeSS/newsAgnosticNewsTradingViaInsiderTrades-Backtest/raw/master/combined_csv.csv"
URL = "https://dl.dropboxusercontent.com/s/yt0wr7pp4pfpko2/combined_csv.csv"
URL = "https://docs.google.com/spreadsheets/d/1S1kCsljns9jAWSkcZicge213x38mUOYMtBvP-hdFjuk/gviz/tq?tqx=out:csv&sheet=combined_csv.csv
they all give me 403 forbidden proxy errors. Any ideas?
Jarett Dunn
I got it importing the data from SubscriptionDataSource, but now I see this:
Use Reliable Data SourcesAlphas should use public data, commercial feeds or QuantConnect's data. Importing a personal dataset through DropBox isn't reliable enough to build a long-term track record and ensure the Alpha will continue working as expected. You should use public data sources (e.g. Quandl), commercial data, or QuantConnect supplied data.
If I were able to pull the info and parse it using BeautifulSoap I wouldn't need to use dropbox drops as a data source... please advise
Laurent Crouzet
"it would perform significanltly different in forward/live tests, because data is delayed by a day or two"
>> Does it mean that the published information has a forward-looking bias?
To be sure it has not any, you would need in your backtest to use the informations that was really published at that time (if delayed by 1 day on June 21, the backtest should only use the data shown up to June 20th - and if delayed by 2 days, the backtest should only use the data shown up to June 19th)
If doing so does not change the returns in the backtests, that is quite a good news for you!
If not, the forward-looking bias is probably significant, and, to my mind, that is to prevent from these discrepancies that QC does not allow its users to be as free as possible in using outside data.
Jarett Dunn
Laurent Crouzet
Yes Laurent Crouzet I could certainly test with data that's at least 1 day old and that should even out the chance the data wasn't available at the time, but I spent quite awhile with my script on QC and could only hit a fraction of what I did using backtrader for Python.. even using the same-day data :( What's more is that my Alpha score didn't really want to break past the 15th or so percentile at any time (I did some research into why this might be, and I think my bigger days losing some trades are hurting the score).
Either way, I didn't want to overfit too much or try to see why I was getting conflicting results as I noticed the above copypasted directions for Alphas indicating they can't use Dropboxed data, which brings my Alpha wishes to a premature close... would love some guidance as to whether or not there's maybe a library included in QC I can use to scrape the website whose data I'm collecting?... even if I could grab a <form>'s contents via QC Python? Or can SubscriptionDataSource be reworked to pull info from a static <form>?
Jarett Dunn
Laurent Crouzet I seem unable to tag you... did this work?
Link Liang
Hi Jarett,
Don't worry, everyone in this thread has automatically subscribed to new replies, so @Laurent should have received emails with your follow up.
Regarding custom data, we only allow downloads from certain white-listed sources. 403 forbidden proxy errors indicate that those URLs are not on our white list. Glad to see you made it work by SubscriptionDataSource.
Unfortunately, we currently don't support BeautifulSoap. You may find all supported libraries here. Moreover, we recommend using QuantConnect data or other reliable public or commercial data with some stable APIs. Grabbing data from webpages will not be considered stable, and will not be accepted in alpha submission. I would suggest considering using other data sources if you wish to submit your alpha.
Thanks for your support!
Laurent Crouzet
Jarett Dunn : sorry, I missed your answer a few days ago.
1. As for the QC's decision not to allow other sources than their white-list... it is their business decision, and I fully understand why they decided not to allow that in Alpha submissions!
2. As for backtests and/or live trading... I think that the more open the platform, the better. However, you need to strictly follow your own rules so that you do not fall into forward-looking biases!
3. As for delaying the data that you collect from this outside source (exactly as you would get data delayed in real-time trading), I think that it would be the solution for your backtests not to be too far from the data you would have collected if that was a live-trading.
In theory, the data you collect should follow the same rules as how QC manages the stream of time:
https://www.quantconnect.com/docs/key-concepts/understanding-timeTo me, the good time management of QC is one of the reasons why I think that QC is a professionnal tool. If time is not correctly managed, it is easy to collect forward-looking data, which destroys any value that could come from backtest's results!
Erik Bengtson
based on the csv file, it seems that you are using sec fillings. you can grab such data from realiable providers.
Note that sec fillings are reported after trade execution, up to 2 days delayed.
https://www.sec.gov/cgi-bin/browse-edgar?action=getcurrent
Laurent Crouzet
Erik Bengtson, you wrote: "Note that sec fillings are reported after trade execution, up to 2 days delayed."
>> Hence my proposal to backtest using the 2-days delay... to check that this does not destroy the returns of the backtest.
Jarett Dunn
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!