Hello everyone,
Here is another video lesson in which you can learn how to web-scrape from websites such as Wikipedia to get custom data that you can use inside of your QC trading algorithms:
To clone the code, click the “Clone Algorithm” button on the attached backtest below.
If you have any questions or comments, please let me know.
Cheers,
Lousi
Fred Painchaud
Hi Louis, All,
While scraping sites, I'd recommend to consider their robots file. For instance, Wikipedia's robots file:
https://en.wikipedia.org/robots.txt
Also, I would recommend your algos to respect 429 responses in that matter.
Finally, also consider that Wikipedia's content can actually be downloaded within very nice databases, all structured and everything. Much faster and much more convenient:
https://en.wikipedia.org/wiki/Wikipedia:Database_download
For all's consideration.
Fred
Fred Painchaud
Hi again,
Hmmmm… It's funny because now that I actually look at the code, there's nothing in there that scrapes Wikipedia or any other site. I think you posted the wrong algo Louis… I did not watch the video but I assume the video talks about web scraping but I also assumed the posted code would be the code discussed in the video…
That being said, the algo you posted is also interesting. Good idea there… 😊
Fred
Fred Painchaud
😊 I now watched the video so you can ignore my post about the code potentially being the wrong one.
Fred
Louis
Hi Fred,
Thanks for the comment. If you watch the video, you could see that we scraped the data in a previous step and then used it to get custom data for the algorithm above.
I hope that makes sense.
Cheers,
Louis
Fred Painchaud
Yes, it makes total sense Louis. Sorry for my own confusion.
While still generally relevant, my first comment about robots files, etc, is also not really relevant in this particular scenario. When I read the title of your post, I thought your algo was going to frequently scrape Wikipedia to gather something like “market sentiment”, stock news, and the likes. I thought you were illustrating the concept using Wikipedia so users could then apply the principle to finance forums, etc.
For “heavy” scraping applications, and particularly if you want your algo bot to be up and running on the long run, my first comment applies. For your scenario in which you just extract some information once and then use it in your backtests, it does not really. I know you know that Louis, but I hope this post clarifies my first to other readers 😊.
Fred
Louis
Hi again Fred,
Thanks for providing this information. It is definitely useful and interesting to look at for regular scraping bots! It would be interesting to potentially create such a bot for a future video. 😊
Louis
Fred Painchaud
Well, as always, the devil is in the details so a production-ready bot that does that tends to get quite complex in order to be truly robust.
To create a simple bot to illustrate the concept, the first milestone is to find a source of information (finance forum, etc) that is 1) relevant enough so the illustration is not 100% synthetic/abstract for the watcher and also so it provides an apparent edge and 2) not so difficult to scrape (parse, categorize data, interpret it, act upon it, etc).
I've been scraping r/StockMarket but it does get complex relatively quickly even to do something basic. I've been using the API, etc…
There is one thing I've been contemplating for a while: scraping redditsearch.io… It would have a lot of pros over scraping Reddit per say.
I did not check it wrt the objective of creating something for educational purposes but on top of my head, it may be “simple enough”.
Food for thoughts…
Fred
Vladimir
Louis
My questions will be about the strategy you posted.
My understanding is that the strategy goes long in stocks recently added to SPX, and goes short in stocks recently removed from SPX.
To visualize portfolio positions I added this code.
From this chart, I can see that the strategy is accumulating short positions.
Is this intentional or by mistake?
Ashutosh
Hi Vladimir
I backtested the code with your addition and did not get a cumulative graph of shorts.
If you see the same conversation as before of shorts getting cumulative can you attach the whole backtest here?
We can also take the leverage of using liquidateExistingHoldings if we want our targets to define the portfolio state.
Attaching my backtest for reference:
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Louis
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!