Hi there! I'm performing experiments with LEAN backtesting and high-performance computing.
Master software is starting hundreds of LEAN launcher instances in my setup on hi-end HPC configurations. Algorithm result json storing and file logging are turned off to evade unnecessary load and file write conflicts.
The problem is all about very poor backtesting performance. Typical algorithm configuration with dozen of rolling windows and indicators on minute resolution and 15-minute consolidation (where all the windows are updated) on 10-years duration takes up to 1 minute to backtest.
Is it possible to dramatically increase backtesting performance?
Algorithm is all about this:
var consolidator =
new QuoteBarConsolidator(TimeSpan.FromMinutes(15));
consolidator.DataConsolidated += ConsolidatorOnDataConsolidated;
_rsi = new RelativeStrengthIndex(14);
_cci = new CommodityChannelIndex(30);
_adx = new AverageDirectionalIndex("ADX", 14);
_stoch = new Stochastic(14, 14, 3);
_williamsPercentR = new WilliamsPercentR(14);
_trix = new Trix(15);
_aroon = new AroonOscillator(14, 14);
_adxr = new AverageDirectionalMovementIndexRating(14);
_ultimateOscillator = new UltimateOscillator(7, 14, 28);
RegisterIndicator(_symbol, _rsi, consolidator);
RegisterIndicator(_symbol, _cci, consolidator);
RegisterIndicator(_symbol, _adx, consolidator);
RegisterIndicator(_symbol, _stoch, consolidator);
RegisterIndicator(_symbol, _williamsPercentR, consolidator);
RegisterIndicator(_symbol, _trix, consolidator);
RegisterIndicator(_symbol, _aroon, consolidator);
RegisterIndicator(_symbol, _adxr, consolidator);
RegisterIndicator(_symbol, _ultimateOscillator, consolidator);
And custom windows populating in consolidator callback.
Hardware: Amazon EC2 c5d.18xlarge instance:
72 3.0 GHz vCPUs
144 Gib RAM
2 x 900 GB NVMe SSD
Thank you for any suggestions in advance!There are some release run profiling results:
Jared Broad
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
AaronBalfour
That's
AddForex(_symbol, Resolution.Minute, Market.Oanda);
On
SetStartDate(2010, 01, 03);
SetEndDate(2018, 01, 01);
Douglas Stridsberg
Interesting results - just wanted to add my own experiences to this, also on Minute data. I have run with two different set-ups and had differing results:
- Setup #1: Similar to yours - I have a master that spawns LEAN slave processes. Each process performs one backtest with one alpha on one underlying. This is the most efficient way I've found of completing backtests. I've tried this on a Google Cloud highcpu instance with 16 CPU cores and been able to run about 10 of them in parallel before exhausting the CPU.
- Setup #2: Here I'm trying one LEAN process - 20 securities, 9 Alphas (each with one consolidator and 1-2 indicators, producing 100-1000 insights per year per alpha per underlying). This is running significantly slower and is only able to use ~30% of my (local) i7-4710MQ. From testing, I am seeing that 15 years of backtesting would take something like 30-45 hours to complete. I have transferred this to the QC web terminal, and initially it runs quite fast but the performace drops off substantially after a few hours. On a 4 CPU core GCloud instance this doesn't seem to be particularly faster either, as it simply cannot use a substantial amount of the processing power (only about 50%).
When profiling setup #2, I can see that roughly 80% of CPU time is spent on sleeping and sychronisation. I'm wondering whether there is scope from the QC team to work on supporting large, multi-asset Framework algorithms like these? I think as people adopt the Framework approach, many will try to add more Alphas and more underlyings and so might run into these bottlenecking issues as well.Jared Broad
the progress in the Python speed project.
https://github.com/QuantConnect/Lean/projects
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Dave Dykes
I've found exactly what Douglas Stridsberg describes above about scenario #2. I didn't catch on to this problem at first because I'd serendipitously parallelized my backtests as he describes in scenario #1.
When backtesting a large number of stocks locally, Lean only seems to be able to use a quarter of my six-core processor (which looks like a twelve-core to Lean). I was looking at second bars in a C# algo for a universe of thousands of stocks, but I had this result down to a hundred instruments. This was the case on Windows and in the Docker container.
Just as Douglas describes, profiling showed Lean held up in the WorkQueue class at the WaitAny method below. Â
It seems to me there's something fundamentally wrong if a backtesting engine can only use a quarter of the compute resources but is CPU-bound and spending the majority of its time waiting on thread synchronization.
Okay, for backtesting, I can get around this easily enough by running four instances. It's still very inefficiently parallelized code, but at least I'm maxing out my processing resources.
For me, the real problem is that this same flaw seems to manifest in real-time (paper) trading. If I try to process tick data from IQFeed for hundreds of equities Lean spends its time synchronizing threads and only uses a quarter of my processor to do it. With this many instruments, I'm simply dropping ticks. I haven't yet identified exactly where this happens or how it happens without an exception being thrown, but I suspect that it's missing them because it simply can't process the required throughput.
I'd be really interested to hear if any QuantConnect users are successfully able to run such algos in realtime. I love the features and extensibility of Lean, but it's all useless to me if it can't process a universe of at least a thousand stocks in real-time.
Â
if (workItem == null)
{
   // no work to do, lets sleep and try again
   WaitHandle.WaitAny(waitHandles, 100);
  continue;
}
Â
Jon Quant
Wow, it's impressive how you guys do your backtesting! I was able to run a custom algorithm in a docker container and configured it to run against an IBKR paper trading account but I noticed that my algorithm was failing on the History call. The same algorithm works fine on QC's web backtesting platform though just not when I'm running it locally in Windows or in a docker container. I think it's because QC's web backtesting platform properly provides data for the History call but IBKR does not. I did not go very far because of this. How did you guys handle this in your setup?
Â
Jared Broad
Hi Dave; one thousand stocks real-time at tick resolution should be a piece of cake to synchronize real-time with a CPU > 3GHz. Waiting on synchronization is expected in backtesting as its processed a cached queue of data ahead of time and it waits for its chance to sync depending on the QCAlgorithm contents/user code.
If the QCAlgorithm thread is C# and empty (i.e. QC Algorithm doing nothing); the synchronization gets up to 2 million data points per second and is able to utilize 5,000% CPU or 800% in a B8. I've seen it many times in the QC Cloud. 2M/s is on par with processing the entire OPRA feed in real-time.
Profiling is pretty tricky of such a huge engine but I'm sure you know it isn't as simple as reading the first result. Assuming you've got experience in this space I'd welcome your help diving into the various bottle necks to make it faster!Â
Â
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
Dave Dykes
Hi Jared, thanks for the follow-up. It's impressive that a co-founding coffee maker responds to these boards so quickly. I'm encouraged by your suggestion this should be possible. I've actually discovered quite a bit on this since my post. Rather than venting on your forum, I'll continue to drill-down on this further and post something concrete like code that illustrates my problem if I can't resolve it.
AaronBalfour
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!