As the title states, I'm running an algorithm where part of it is calculating the correlation between the SPY and TLT.
Below is the code I'm using for just the correlation component.
from scipy.stats.stats import pearsonr
import pandas as pd
class CalmFluorescentPinkAntelope(QCAlgorithm):
def Initialize(self):
self.SetStartDate(2020, 8, 9)
self.SetCash(100000)
self.AddEquity("SPY", Resolution.Daily)
self.AddEquity("TLT", Resolution.Daily)
self.SetWarmUp(50)
def OnData(self, data):
my_corr = self.find_corr(data)
def find_corr(self, data):
spy_close = self.History(self.Symbol("SPY"), 22, Resolution.Daily).loc["SPY"]["close"][:-1]
tlt_close = self.History(self.Symbol("TLT"), 22, Resolution.Daily).loc["TLT"]["close"][:-1]
try:
self.Debug(pearsonr(spy_close, tlt_close)[0])
except:
self.Debug("spy " + str(len(spy_close)))
self.Debug("tlt " + str(len(tlt_close)))
return pearsonr(spy_close, tlt_close)[0]
What ends up happening is I get the “x and y must have the same length” error. This only happens at certain points in time, such as on the 11th of August of 2020. I checked the length of the arrays using the debug in the code, and found that on that day, the length of the arrays is indeed different, where the SPY data would have a length of 21, while the TLT data would have a length of 20.
The warm up runs without issue. Am I doing something wrong?
Fred Painchaud
Hi Emile,
With [:-1] at the end of your close Series, a length of 21 is normal. A length of 20 is 1 too short. I'd guess that on that day, the call to History returns one data point less than “it should” for TLT / i.e., their might be a problem with the data.
You can debug a bit more to confirm that like checking what history returns prior to selecting the index and column. If any error in the data, you can report here:
In the meantime, you could truncate the longer series to 20, just so the correlation works and continue working on the rest of your algo…
Fred
Vladimir
Emile Farkouh,
Try this way.
If you are satisfied with my answer, please accept it.
Emile Farkouh
Fred Painchaud
What's strange is that this was working a while back without issue. I migrated it from Quantopian and it was functioning well. The issue only appeared recently.
The other thing is, as I mentioned, one of the dates that this breaks is on 11 August 2020, but if I move the start date to, for example, the 12th of that month, and set the warmup period to include the 11th (50 days let's say), then there are no issues. In other words it'll calculate fine if it's in the warmup period, but not as part of the main part of the algo.
Finally, if I set the time in the pearsonr to something different, for example to 20 days instead of 21 as you'd suggested, I still get the error, where SPY data has a correct length of 20, and TLT data has an incorrect length of 19.
Thanks for the link. I think this can be reported.
Vladimir
Thanks very much for your code.
I appreciate the.pct_change(), but I am looking at price correlations at the moment. Second, the algo I'm running does need a current, fresh value for correlation, so
kind of defeats the purpose. As soon as I took out that line, I faced the same issue.
Fred Painchaud
Hi Emile,
“What's strange is that this was working a while back without issue. I migrated it from Quantopian and it was functioning well. The issue only appeared recently.”
Yeah, it happens. Why it happens now? Difficult to say. Possible reasons are many: changes were recently made in the data, libraries were updated, LEAN was modified, etc, etc, etc, etc.
“The other thing is, as I mentioned, one of the dates that this breaks is on 11 August 2020, but if I move the start date to, for example, the 12th of that month, and set the warmup period to include the 11th (50 days let's say), then there are no issues. In other words it'll calculate fine if it's in the warmup period, but not as part of the main part of the algo.”
Again, only a potential hypothesis but it may be because using the data to warmup does not raise an error because the eventual error in the data is ignored or simply does not create an error BUT the same error in your pearson code does create a problem.
“Finally, if I set the time in the pearsonr to something different, for example to 20 days instead of 21 as you'd suggested, I still get the error, where SPY data has a correct length of 20, and TLT data has an incorrect length of 19.”
We did not understand each other's. I meant to truncate the Series result you get which is longer:
It calculates a pearson value from less points than desired but at least you get a point.
As you probably know, a lot of checks are done in an algo simply to verify if the data is “as expected”. Without those checks, any algo will certainly halt with a crash eventually because no data feed is perfect and eternally connected, etc. Algos expecting that data will always be ok are oftentimes disappointed 😊.
Fred
Vladimir
Emile Farkouh,
Here is the “Stock-Bond price correlation” as you wish.
It dose not include a current, fresh value → [:-1]
If you are satisfied with my answer, please accept it.
Emile Farkouh
Just went through the data explorer and found that the 11th of August is missing from the set. Might be that others are missing as well. I'll go ahead and report it. Thanks for your help Vladimir and Fred Painchaud
Emile Farkouh
On second check, the data is not missing. Thanks again for the solutions offered Vladimir and Fred Painchaud. I'll report as a bug and see what comes up.
Emile Farkouh
The material on this website is provided for informational purposes only and does not constitute an offer to sell, a solicitation to buy, or a recommendation or endorsement for any security or strategy, nor does it constitute an offer to provide investment advisory services by QuantConnect. In addition, the material offers no opinion with respect to the suitability of any security or specific investment. QuantConnect makes no guarantees as to the accuracy or completeness of the views expressed in the website. The views are subject to change, and may have become unreliable for various reasons, including changes in market conditions or economic circumstances. All investments involve risk, including loss of principal. You should consult with an investment professional before making any investment decisions.
To unlock posting to the community forums please complete at least 30% of Boot Camp.
You can continue your Boot Camp training progress from the terminal. We hope to see you in the community soon!