QCAlgorithm Take 8min to even begin to start Initalizing for backtesting

So, im noticing my QCAlgorithm is take verry long to init for a backtest. Or at least thats what the logs are telling me.. Im running this on a Mabook M1 Max, and pulling 3 years of minute data. However, as you can see in the logs, it takes 8min to even start to run the Initialize function..

It looks like its doing some serious work to spool up the QCAlgorithm and im wondering what its doing and why? I cant see what amount of work should require this amount of time and computation, I mean this takes up around 70% of the total computation time of my algorithm.

What can i do to reduce this significant overhead?

Snippet of the start of the QCAlgorithm, this is before any of my code starts to run….

class Algorithm(QCAlgorithm):
    def Initialize(self):
        self.Debug('!! STARTING QCAlgorithm !!')
        ...

And the logs, notice the timestamp from start till the “STARTING QCAlgorithm” log:

2024-06-05T10:18:11.7331387Z TRACE:: Engine.Main(): LEAN ALGORITHMIC TRADING ENGINE v2.5.0.0 Mode: DEBUG (64bit) Host: Nicks-MBP
2024-06-05T10:18:11.7360021Z TRACE:: Engine.Main(): Started 10:18 AM
2024-06-05T10:18:11.7408818Z TRACE:: Config.Get(): Configuration key not found. Key: lean-manager-type - Using default value: LocalLeanManager
2024-06-05T10:18:11.7538791Z TRACE:: JobQueue.NextJob(): Selected /LeanCLI/main.py
2024-06-05T10:18:11.7991597Z TRACE:: Config.GetValue(): scheduled-event-leaky-bucket-capacity - Using default value: 120
2024-06-05T10:18:11.7995320Z TRACE:: Config.GetValue(): scheduled-event-leaky-bucket-time-interval-minutes - Using default value: 1440
2024-06-05T10:18:11.8001345Z TRACE:: Config.GetValue(): scheduled-event-leaky-bucket-refill-amount - Using default value: 18
2024-06-05T10:18:11.8010785Z TRACE:: Config.GetValue(): storage-limit - Using default value: 10737418240
2024-06-05T10:18:11.8012984Z TRACE:: Config.GetValue(): storage-permissions - Using default value: 3
2024-06-05T10:18:11.8015027Z TRACE:: Config.Get(): Configuration key not found. Key: optimization-id - Using default value: 
2024-06-05T10:18:11.8017297Z TRACE:: Config.Get(): Configuration key not found. Key: backtest-name - Using default value: local
2024-06-05T10:18:11.8031025Z TRACE:: Config.Get(): Configuration key not found. Key: data-channel - Using default value: 
2024-06-05T10:18:11.8033547Z TRACE:: Config.Get(): Configuration key not found. Key: python-venv - Using default value: 
2024-06-05T10:18:11.8035987Z TRACE:: Config.Get(): Configuration key not found. Key: out-of-sample-max-end-date - Using default value: 
2024-06-05T10:18:11.8037973Z TRACE:: Config.GetValue(): out-of-sample-days - Using default value: 0
2024-06-05T10:18:11.8046617Z TRACE:: Config.Get(): Configuration key not found. Key: data-permission-manager - Using default value: DataPermissionManager
2024-06-05T10:18:11.8227957Z TRACE:: Config.GetValue(): zip-data-cache-provider - Using default value: 10
2024-06-05T10:18:11.8248271Z TRACE:: Config.Get(): Configuration key not found. Key: fundamental-data-provider - Using default value: CoarseFundamentalDataProvider
2024-06-05T10:18:11.8283000Z TRACE:: AlgorithmManager.CreateTokenBucket(): Initializing LeakyBucket: Capacity: 120 RefillAmount: 18 TimeInterval: 1440
2024-06-05T10:18:11.8290367Z TRACE:: Config.GetValue(): algorithm-manager-time-loop-maximum - Using default value: 20
2024-06-05T10:18:11.8407874Z TRACE:: Engine.Run(): Resource limits '0' CPUs. 2147483647 MB RAM.
2024-06-05T10:18:11.8417901Z TRACE:: TextSubscriptionDataSourceReader.SetCacheSize(): Setting cache size to 71582788 items
2024-06-05T10:18:11.8757568Z TRACE:: Config.GetValue(): downloader-data-update-period - Using default value: 7
2024-06-05T10:18:12.1592147Z TRACE:: Config.GetValue(): algorithm-creation-timeout - Using default value: 90
2024-06-05T10:18:12.1610056Z TRACE:: PythonInitializer.Initialize(): start...
2024-06-05T10:18:12.8989352Z TRACE:: PythonInitializer.Initialize(): ended
2024-06-05T10:18:12.9001878Z TRACE:: AlgorithmPythonWrapper(): Python version 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:39:38) [GCC 12.3.0]: Importing python module main
2024-06-05T10:18:15.9321830Z TRACE:: AlgorithmPythonWrapper(): main successfully imported.
2024-06-05T10:18:15.9333594Z TRACE:: AlgorithmPythonWrapper(): Creating IAlgorithm instance.
2024-06-05T10:18:16.0078509Z TRACE:: Config.GetValue(): mute-python-library-logging - Using default value: True
2024-06-05T10:18:16.0095393Z TRACE:: LocalObjectStore.Initialize(): Storage Root: /Storage. StorageFileCount 9999999. StorageLimit 10240MB
2024-06-05T10:18:16.0139510Z TRACE:: Config.GetValue(): zip-data-cache-provider - Using default value: 10
2024-06-05T10:18:16.0190366Z TRACE:: HistoryProviderManager.Initialize(): history providers [SubscriptionDataReaderHistoryProvider]
2024-06-05T10:18:16.0203435Z TRACE:: BacktestingSetupHandler.Setup(): Setting up job: UID: 275814, PID: 0, Version: 2.5.0.0, Source: WebIDE
2024-06-05T10:18:16.0243732Z TRACE:: Config.Get(): Configuration key not found. Key: security-data-feeds - Using default value: 
2024-06-05T10:18:16.0362156Z TRACE:: SecurityPortfolioManager.SetAccountCurrency(): setting account currency to USDT
2024-06-05T10:18:16.0459516Z TRACE:: Config.Get(): Configuration key not found. Key: reserved-words-prefix - Using default value: @
2024-06-05T10:18:16.2066269Z TRACE:: Config.GetValue(): data-feed-workers-count - Using default value: 10
2024-06-05T10:18:16.2086538Z TRACE:: Config.GetValue(): data-feed-max-work-weight - Using default value: 400
2024-06-05T10:18:16.2088863Z TRACE:: WeightedWorkScheduler(): will use 10 workers and MaxWorkWeight is 400
2024-06-05T10:19:16.1136068Z TRACE:: Isolator.ExecuteWithTimeLimit(): Used: 1089, Sample: 1092, App: 4950, CurrentTimeStepElapsed: 00:00.000. CPU: 101%
2024-06-05T10:19:17.6867145Z TRACE:: Event Name "Training: Now: 2023-08-14T00:00:00.0000000Z", scheduled to run.
2024-06-05T10:19:17.7416744Z TRACE:: BaseSetupHandler.SetupCurrencyConversions():
Account Type: Margin

Symbol      Quantity    Conversion = Value in USDT
USDT: ₮       10000.00 @       1.00 = ₮10000
USD: $           0.00 @       1.00 = ₮0
BTC: ₿           0.00 @   29293.30 = ₮0.0
-------------------------------------------------
CashBook Total Value:                ₮10000.0

2024-06-05T10:19:17.7430493Z TRACE:: Total margin information: TotalMarginUsed: 0.00, MarginRemaining: 10000.00
2024-06-05T10:19:17.7438259Z TRACE:: SetUp Backtesting: User: 275814 ProjectId: 0 AlgoId: 1922862025
2024-06-05T10:19:17.7446366Z TRACE:: Dates: Start: 08/14/2023 End: 10/19/2023 Cash: ¤10,000.00 MaximumRuntime: 100.00:00:00 MaxOrders: 2147483647
2024-06-05T10:19:17.7463985Z TRACE:: BacktestingResultHandler(): Sample Period Set: 24.12
2024-06-05T10:19:17.7471504Z TRACE:: Config.GetValue(): forward-console-messages - Using default value: True
2024-06-05T10:19:17.7476460Z TRACE:: JOB HANDLERS:
         DataFeed:             QuantConnect.Lean.Engine.DataFeeds.FileSystemDataFeed
         Setup:                QuantConnect.Lean.Engine.Setup.BacktestingSetupHandler
         RealTime:             QuantConnect.Lean.Engine.RealTime.BacktestingRealTimeHandler
         Results:              QuantConnect.Lean.Engine.Results.BacktestingResultHandler
         Transactions:         QuantConnect.Lean.Engine.TransactionHandlers.BacktestingTransactionHandler
         Object Store:         QuantConnect.Lean.Engine.Storage.LocalObjectStore
         History Provider:     QuantConnect.Lean.Engine.HistoricalData.HistoryProviderManager
         Brokerage:            QuantConnect.Brokerages.Backtesting.BacktestingBrokerage
         Data Provider:        QuantConnect.Lean.Engine.DataFeeds.DefaultDataProvider

2024-06-05T10:19:17.7697602Z TRACE:: Event Name "Daily Sampling", scheduled to run.
2024-06-05T10:19:17.7705703Z TRACE:: AlgorithmManager.Run(): Begin DataStream - Start: 8/14/2023 12:00:00 AM Stop: 10/19/2023 11:59:59 PM Time: 8/14/2023 12:00:00 AM Warmup: False
2024-06-05T10:19:17.8074014Z TRACE:: UniverseSelection.AddPendingInternalDataFeeds(): Adding internal benchmark data feed BTCUSDT,#0,BTCUSDT,Hour,TradeBar,Trade,Adjusted,OpenInterest,Internal
2024-06-05T10:19:17.8129128Z TRACE:: Debug: Launching analysis for 1922862025 with LEAN Engine v2.5.0.0
2024-06-05T10:20:17.8536165Z TRACE:: Isolator.ExecuteWithTimeLimit(): Used: 958, Sample: 1531, App: 6192, CurrentTimeStepElapsed: 00:58.089. CPU: 230%
2024-06-05T10:20:18.0325015Z TRACE:: LeakyBucket.TryConsume(1): Successfully consumed tokens. Available: 119
2024-06-05T10:21:17.8818183Z TRACE:: LeakyBucket.TryConsume(1): Successfully consumed tokens. Available: 118
2024-06-05T10:21:18.4167744Z TRACE:: Isolator.ExecuteWithTimeLimit(): Used: 1322, Sample: 1531, App: 6292, CurrentTimeStepElapsed: 01:58.652. CPU: 227%
2024-06-05T10:22:18.0319385Z TRACE:: LeakyBucket.TryConsume(1): Successfully consumed tokens. Available: 117
2024-06-05T10:22:18.4819859Z TRACE:: Isolator.ExecuteWithTimeLimit(): Used: 1455, Sample: 1531, App: 6326, CurrentTimeStepElapsed: 02:58.718. CPU: 229%
2024-06-05T10:23:17.8833913Z TRACE:: LeakyBucket.TryConsume(1): Successfully consumed tokens. Available: 116
2024-06-05T10:23:18.5446680Z TRACE:: Isolator.ExecuteWithTimeLimit(): Used: 1503, Sample: 1531, App: 6361, CurrentTimeStepElapsed: 03:58.780. CPU: 234%
2024-06-05T10:24:18.0426491Z TRACE:: LeakyBucket.TryConsume(1): Successfully consumed tokens. Available: 115
2024-06-05T10:24:18.6082266Z TRACE:: Isolator.ExecuteWithTimeLimit(): Used: 1521, Sample: 1531, App: 6380, CurrentTimeStepElapsed: 04:58.844. CPU: 248%
2024-06-05T10:25:17.8997984Z TRACE:: LeakyBucket.TryConsume(1): Successfully consumed tokens. Available: 114
2024-06-05T10:25:18.6736093Z TRACE:: Isolator.ExecuteWithTimeLimit(): Used: 1528, Sample: 1532, App: 6377, CurrentTimeStepElapsed: 05:58.909. CPU: 234%
2024-06-05T10:26:18.0388527Z TRACE:: LeakyBucket.TryConsume(1): Successfully consumed tokens. Available: 113
2024-06-05T10:26:18.7359449Z TRACE:: Isolator.ExecuteWithTimeLimit(): Used: 1530, Sample: 1532, App: 6392, CurrentTimeStepElapsed: 06:58.971. CPU: 310%
2024-06-05T10:26:19.9147910Z TRACE:: Debug: !! STARTING QCAlgorithm !!
2024-06-05T10:26:19.9150695Z TRACE:: Debug: Changing account currency from USD to USDT...

The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.

To access locally stored minute data in QuantConnect, you can use the AddEquity method with Resolution.Minute as shown in the example below:

self.symbol = self.AddEquity("SPY", Resolution.Minute).Symbol

This code will add the SPY equity to your algorithm and set the data resolution to minute-level. The data will be stored locally and can be accessed for backtesting.

However, please note that if you're using LEAN CLI to run backtests locally, you'll need to download the data you need using lean data download. Also, keep in mind that the data from QuantConnect is licensed to be used on the web exclusively, and for access outside of it, you need to pay additional fees. LEAN CLI will guide you to the payment, etc.

You can't have access to Universe Selection data outside of the web (cloud). So you select a fixed set of assets to trade. You also know your backtest start and end dates in advance. You also know your resolution in advance for your assets. So you know in advance all the information needed to select the data you need, such as AAPL, all 2018, minute data.

Sources:

2024-06-05T10:14:12.5281204Z TRACE:: Debug: Getting historical data... 2024-06-05T10:14:12.5283568Z TRACE:: Debug: Got history 2020-08-29 01:00:00 -> 2023-08-14 00:00:00, Bar Count 25920 2024-06-05T10:14:12.5285060Z TRACE:: Debug: Resampled history to 3600S 25920 Took: 8.156544208526611 seconds 2024-06-05T10:14:12.5288026Z TRACE:: Debug: Getting historical data...

Nick Grobler

99 ,

FYI, the minute data is locally stored

Mia Alissi

STAFF ,

Hi Mia, as mentioned i am already doing this. The issue ive outlined is clear though, why is it taking 8min before the debug output is (which is the first line of code run) to be executed..

Based on the information provided in the knowledge sources, it's not clear why there would be an 8-minute delay before the debug output is executed. The execution speed can depend on various factors such as the complexity of your algorithm, the amount of data being processed, and the current load on the QuantConnect servers.

However, it's important to note that the Debug method in QuantConnect is not intended for high-frequency logging, and excessive usage can slow down your algorithm. If you're using Debug extensively, you might want to consider reducing its usage or using other methods to monitor your algorithm's performance.

If the issue persists, it would be helpful to see the relevant parts of your code to provide a more specific answer.

I suspect the time is used when doing this, which im doing AFTER that debug statement. But this still doesnt make sense as why would that execute BEFORE the debug statement, something seems odd here.

Even ignoring that issue of debug output and order of operations. This is for around 3 years of 1m data, so roughly 1.5million timesamples. This shouldnt take this long to obtain as this is stored locally, loading and interatacting with a dataframe this size outside of QC is very fast.

self.history(TradeBar, symbol, int(last_timedelta.total_seconds()/self.strategy.resolution_seconds_count), self.strategy.data_resolution)

Even then though, based on the debug its telling me this is how long it takes to load and resample the historical data. None of the timestamps line up with the actual time executed.

Its not clear how this code is executed, and in what order. Im not convinced the order of debug output is accurate…

Im wondering if the time QCAlgorithum spends spooling up is taken up by unzipping the data files? Is there any insight i can debug this? Is there anyway to have the data be unzipped or optimized for faster imports. Im running locally so i dont really care about datasize right now.

Based on the information provided in the knowledge sources, it's not clear why there would be a delay in the execution of the debug statement. However, there are a few things that might be causing this issue:

Data Processing Time: If you're requesting a large amount of historical data, it might take some time to process this data. This could potentially cause a delay in the execution of the debug statement. However, as you mentioned, this shouldn't be an issue if the data is stored locally and the data size is not excessively large.
Order of Operations: In QuantConnect, the order of operations is determined by the sequence of method calls in your algorithm. If the debug statement is placed after the self.history call, it will be executed after the historical data has been fetched. However, if the debug statement is executed before the self.history call, it should be printed out immediately.
Data Unzipping: If the data files are stored in a compressed format, it might take some time to unzip these files before they can be used. This could potentially cause a delay in the execution of the debug statement. However, it's not clear from the knowledge sources whether this is the case.

To debug this issue, you might want to try the following:

Place the debug statement before the self.history call to see if it's executed immediately.
Reduce the amount of historical data you're requesting to see if this speeds up the execution of the debug statement.
Check the format of the data files to see if they're stored in a compressed format. If they are, you might want to consider unzipping these files to speed up data access.

Hmm. When i unzip the zip files in the lean data directory, it doesnt recgonize it anymore.. It appears to only recognize the files when they are zipped.

working:
data/crypto/binance/minute/btcusdt/20200101_trade.zip

not working: (this is the exact contents of the zipped file)
data/crypto/binance/minute/btcusdt/20200101_btcusdt_15m_trade.csv

The LEAN engine is designed to work with zipped data files. When you unzip the files, LEAN can no longer recognize them because it expects the data to be in a zipped format. This is why your unzipped file is not working.

The data directory structure and file format are important for LEAN to correctly recognize and load the data. The correct format for the data path is data/crypto/binance/minute/btcusdt/20200101_trade.zip, as you've mentioned.

If you want to work with unzipped data, you would need to modify the LEAN engine's data loading mechanism, which might be complex and could potentially lead to other issues.

Therefore, it's recommended to keep the data in its original zipped format when using it with the LEAN engine.

Ok, After unzipping the files outside of LEAN it takes around 5min so I suspect this is where a lot of the time is taken. Thats unfortunate LEAN required the files to be zipped because as mentioned that takes far too long. Is there no way to cache this or to load in these files a different way? This is quite concerning this is the only way as the performance hit is completely avoidable.

Based on the information provided in the knowledge sources, the LEAN engine is designed to work with zipped data files for efficiency and storage considerations. Unzipping the files outside of LEAN and then trying to use them might not work as expected because LEAN expects the data to be in a specific format and directory structure.

However, there are a few potential ways to improve the data loading speed:

Data Caching: LEAN does support data caching to some extent. For example, if you're running multiple backtests with the same algorithm, LEAN will cache the data after the first backtest, which can speed up subsequent backtests. However, this might not be helpful if you're running different algorithms or changing the data frequently.
Data Provider and Cache Provider: LEAN has a feature that allows you to plug in custom data providers and cache providers. This could potentially be used to implement a more efficient data loading mechanism. However, this would require a good understanding of the LEAN engine's internals and might be complex to implement.
Optimized Data Format: Another potential solution could be to use a more optimized data format that can be loaded faster. However, this would also require changes to the LEAN engine and might not be feasible without a deep understanding of the engine's internals.

Unfortunately, without making changes to the LEAN engine itself, there might not be a straightforward way to significantly improve the data loading speed. If you're experiencing performance issues, it might be worth reaching out to the QuantConnect community or support for more specific advice based on your particular use case.

How would i use data caching? Do you have an example of that?

thanks

Based on the information provided in the knowledge sources, there isn't a specific example of using data caching in QuantConnect. However, data caching in QuantConnect is handled automatically by the LEAN engine.

When you run a backtest, the data used by your algorithm is loaded into memory. If you run another backtest with the same algorithm and the same data immediately afterwards, LEAN will use the cached data instead of loading the data again. This can speed up the backtest.

However, please note that the data cache is cleared when you restart the LEAN engine. Therefore, if you're running backtests in separate sessions, the data will need to be loaded each time.

If you're experiencing slow backtests, you might want to consider optimizing your algorithm or reducing the amount of data you're using. For example, you could use a smaller universe of assets, use a lower resolution of data, or make more efficient history calls.

Nick Grobler INVESTOR

Update Backtest

Notebook

person upvoted this people upvoted this

To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!

Platform

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

333,100 Quants.

VOTE FOR UPCOMING FEATURES

QCAlgorithm Take 8min to even begin to start Initalizing for backtesting

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

Actions

Join QuantConnect for Free

Platform

SIGN IN

Radically Open-Source Algorithmic Trading Engine

Join Our Discord Channel

Quarterly Open-Source Trading Competition

Draft Discussions

Bookmarked Discussions

SEARCH DISCUSSIONS

TOP 5 Research PUblications

About Quant League

competition rules

previous competitions

333,100 Quants.

VOTE FOR UPCOMING FEATURES

QCAlgorithm Take 8min to even begin to start Initalizing for backtesting

Organization

Team

Clone Strategy

Previous Ranking

IN THIS RESEARCH

PARTICIPANTS

Discussion Awards

SHARE RESEARCH

SHARE DISCUSSION

SHARE ARTICLE

SHARE

Actions

Join QuantConnect for Free