Hey everyone,
I made a post a couple days ago about speeding up backtests locally in LEAN, but now I'm wondering how I can get started using cloud computing to run even faster.
How can I set up my LEAN to run on say, Microsoft Azure or AWS? A quick google suggested to use Virtual Machines, as well as some other options, but I have no experience in this area and would like help setting up.
Additionally, how can I secure my data (encryption, etc.) before running it on the cloud?
Thanks!
Andre Stevens
For anyone else interested, this seems to be how https://blogs.msdn.microsoft.com/visualstudio/2015/01/08/azure-virtual-machine-images-for-visual-studio/
though i'm still looking into it.
Still figuring out the additional encryption and security part of it though though.
JayJayD
First of all, QC is already cloud computing. Do you know that you can have a cloud-based live algorithm for free with the QC subscription? The only reason why you should use Lean with some cloud services is because you cannot run your algorithm in the QC live environment.
Said that, you have basically two options (you know the difference between IaaS and PaaS, I guess):
Of course, then you have to maintain the service, update Lean periodically, pump new data, etc.
Finally, unless you really know why you need to use Lean outside QC, do it; if not simply subscribe to QC and be happy. Trust me, I have a Lean instance running in an Azure VM since a couple of months.
Zach Oakes
Hey guys, I had a similar question -- though not about hosting the algorithm in AWS / Azure, but more so interested in importing data from S3 (like it's imported from Dropbox). Â Is this an option ?Â
I have some models in Sagemaker that make schedules predictions that could be imported and used in QC signals if possible. S3 import functionality would be a nice feature given the growth of tools like PySpark with AWS compute clusters and AutoML in Sagemaker.Â
Also would offer a way to import models from AWs docker objects, and an easy way to reuse existing ML models. Â
I'm tempted to just fool around with trying normal methods of S3 import, but figured I would ask.Â
Â
ZachÂ
Shile Wen
Hi Zach,
Connecting to AWS S3 directly is not supported at the moment, however, you can have the S3 return data through a REST API call, or have it update a remote CSV (which not limited to DropBox, here I use Github, any remote file hosting should work).
Best,Â
Shile Wen
Zach Oakes
Shile,
Â
Thanks for the reply -- I'll check out the link. I have used dropbox, I suppose I could schedule predictions and simply add a step where it writes the data to dropbox. Curious to see how it works where you're reading from Github? Dropbox seems to have alot of downtime is my concern there -- not sure if it's a production level storage option.
Shile Wen
Hi Zach,
To be able to read the data from Github, I uploaded a .csv file to a Repository (which I named FileStore), then navigated to the .csv file, then clicked "Raw" button get the raw file (example), then copied that URL and read it in using the built-in data reader. As for updating this file programmatically, there are various methods to update git files, such as pygit2 for Python.
Best,
Shile Wen
Zach Oakes
That's very clever. Â Do you think the logic is interchangeable with my dropbox Parse() method? Here, I'll find it. Â I'm using the Dropbox library in python, it's not great, but simple enough. Â Just the downtime is my big concern -- I've seen it down like 4x in the past week -- crazy for a company solely built on storage uptime.
Â
Â
def Parse(self, url): # Download file from url as string file = self.Download(url).split("\n") # # Remove formatting characters data = [x.replace("\r", "").replace(" ", "") for x in file] # # Split data by date and symbol split_data = [x.split(",") for x in data] # Dictionary to hold list of active symbols for each date, keyed by date #dates = [i for i in split_data[0]] parse = lambda d: datetime.strptime(d, '%Y%m%d').date() #Issues here... pd.to_datetime()? symbolsByDate = {} #longsByDate = {} #shortsByDate = {} #signalBySymbol = {} self.LE_T = [] #self.SE_T = [] #Data Format: Date Symbol1 Symbol2 Symbol3 ... # Parse data into dictionary -- LOOPS THROUGH ROWS essentially (by vector) for arr in split_data: #Add Symbols to universe! date = pd.to_datetime(arr[0]).date() #Alternative? #date = parse(arr[0]) #Replaced this w. original from ex. #self.Debug(f"{arr} -- {arr[1]}") #Print Debug symbols = [Symbol.Create(ticker, SecurityType.Equity, Market.USA) for ticker in arr[1:]] #...
Â
Shile Wen
Â
Hi Zach,
As long as the file format stays the same between moving it from Dropbox to Github, if you change the URL string to the Github link everything should work. Just be sure it's the Raw link as the normal link won't work.
Best Regards,
Shile Wen
Zach Oakes
Thanks!
Andre Stevens
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!