Backtesting
Research Guide
Introduction
QuantConnect aims to teach and inspire our community to create high-performing algorithmic trading strategies. We measure our success by the profits created by the community through their live trading. As such, we try to build the best quantitative research techniques possible into the product to encourage a robust research process.
Hypothesis-Driven Research
We recommend you develop an algorithmic trading strategy based on a central hypothesis. You should develop an algorithm hypothesis at the start of your research and spend the remaining time exploring how to test your theory. If you find yourself deviating from your core theory or introducing code that isn't based around that hypothesis, you should stop and go back to thesis development.
Wang et al. (2014) illustrate the danger of creating your hypothesis based on test results. In their research, they examined the earnings yield factor in the technology sector over time. During 1998-1999, before the tech bubble burst, the factor was unprofitable. If you saw the results and then decided to bet against the factor during 2000-2002, you would have lost a lot of money because the factor performed extremely well during that time.
Hypothesis development is somewhat of an art and requires creativity and great observation skills. It is one of the most powerful skills a quant can learn. We recommend that an algorithm hypothesis follow the pattern of cause and effect. Your aim should be to express your strategy in the following sentence:
A change in {cause} leads to an {effect}.
To search for inspiration, consider causes from your own experience, intuition, or the media. Generally, causes of financial market movements fall into the following categories:
- Human psychology
- Real-world events/fundamentals
- Invisible financial actions
Consider the following examples:
Cause | leads to | Effect |
---|---|---|
Share class stocks are the same company, so any price divergence is irrational... | A perfect pairs trade. Since they are the same company, the price will revert. | |
New stock addition to the S&P500 Index causes fund managers to buy up stock... | An increase in the price of the new asset in the universe from buying pressure. | |
Increase in sunshine-hours increases the production of oranges... | An increase in the supply of oranges, decreasing the price of Orange Juice Futures. | |
Allegations of fraud by the CEO causes investor faith in the stock to fall... | A collapse of stock prices for the company as people panic. | |
FDA approval of a new drug opens up new markets for the pharmaceutical company... | A jump in stock prices for the company. | |
Increasing federal interest rates restrict lending from banks, raising interest rates... | Restricted REIT leverage and lower REIT ETF returns. |
There are millions of potential alpha strategies to explore, each of them a candidate for an algorithm. Once you have chosen a strategy, we recommend exploring it for no more than 8-32 hours, depending on your coding ability.
Research Panel
We launched the Research Guide in 2019 to inform you about common quantitative research pitfalls. It displays a power gauge for the number of backtests performed, the number of parameters used, and the time invested in the strategy. These measures can give a ballpark estimate of the overfitting risk of the project. Generally, as a strategy becomes more overfit on historical data, it is less likely to perform well in live trading. These properties were selected based on the recommended best practices of the global quantitative research community.
Restricting Backtests
According to current research, the number of backtests performed on an idea should be limited to prevent overfitting. In theory, each backtest performed on an idea moves it one step closer to being overfitted as you are testing and selecting for strategies written into your code instead of being based on a central thesis. For more information, see the paper Probability of Backtest Overfitting (Bailey, Borwein, Jim Zho, & López de Prado, 2015).
QuantConnect does not restrict the number of backtests performed on a project, but we have implemented the counter as a guide for your reference. Your coding skills are a factor in how many backtests constitute overfitting, so if you are a new programmer you can increase these targets.
Backtest Count Overfit Reference | ||
---|---|---|
0-30: Likely Not Overfit | 30-70: Possibly Overfitting | 70+ Probably Overfitting |
Reducing Strategy Parameters
With just a handful of parameters, it is possible to create an algorithm that perfectly models historical markets. Current research suggests keeping your parameter count to a minimum to decrease the risk of overfitting.
Parameter Overfit Reference | ||
---|---|---|
0-10: Likely Not Overfit | 10-20: Possibly Overfitting | 20+ Probably Overfitting |
Limiting Research Time Invested
As you spend more time on one algorithm, research suggests you are more likely to be overfitting the strategy to the data. It is common to become attached to an idea and spend weeks or months to perform well in a backtest. Assuming you are a proficient coder who fully understands the QuantConnect API, we recommend no more than 16 hours of work per experiment. In theory, within two full working days, you should be able to test a single hypothesis thoroughly.
Research Time Overfitting Reference | ||
---|---|---|
0-8 Hours: Likely Not Overfit | 8-16 Hours: Possibly Overfitting | 16 Hours+ Probably Overfitting |
Parameter Detection
Using parameters is almost unavoidable, but a strategy trends toward being overfitted as more parameters get added or fine-tuned. Adding or optimizing parameters should only be done by a robust methodology such as walk-forward optimization. The parameter detection system is a general guide to inform you of how many parameters are present in the algorithm. It looks for criteria to warn that code is potentially a parameter. The following table shows the criteria for parameters:
Parameter Types | Example Instances |
---|---|
Numeric Comparison | Numeric operators used to compare numeric arguments: <= < > >= |
Time Span | Setting the interval of TimeSpan or timedelta |
Order Event | Inputting numeric arguments when placing orders |
Scheduled Event | Inputting numeric arguments when scheduling an algorithm event to occur |
Variable Assignment | Assigning numeric values to variables |
Mathematical Operation | Any mathematical operation involving explicit numbers |
Lean API | Numeric arguments passed to Indicators, Consolidators, Rolling Windows, etc. |
The following table shows common expressions that are not parameters:
Non-Parameter Types | Example Instances |
---|---|
Common APIs |
SetStartDate set_start_date , SetEndDate set_end_date , SetCash set_cash , etc.
|
Boolean Comparison | Testing for True or False conditions |
String Numbers |
Numbers formatted as part of Log log method or Debug debug method statements
|
Variable Names |
Any variable names that use numbers as part of the name (for example, smaIndicator200 )
|
Common Functions | Rounding, array indexing, boolean comparison using 1/0 for True/False, etc. |
Overfitting
Overfitting occurs when you fine-tune the parameters of an algorithm to fit the detail and noise of backtesting data to the extent that it negatively impacts the performance of the algorithm on new data. The problem is that the parameters don't necessarily apply to new data and thus negatively impact the algorithm's ability to generalize and trade well in all market conditions. The following table shows ways that overfitting can manifest itself:
Data Practice | Description |
---|---|
Data Dredging | Performing many statistical tests on data and only paying attention to those that come back with significant results. |
Hyper-Tuning Parameters | Manually changing algorithm parameters to produce better results without altering the test data. |
Overfit Regression Models | Regression, machine learning, or other statistical models with too many variables will likely introduce overfitting to an algorithm. |
Stale Testing Data | Not changing the backtesting data set when testing the algorithm. Any improvements might not be able to be generalized to different datasets. |
An algorithm that is dynamic and generalizes to new data is more valuable to funds and individual investors. It is more likely to survive across different market conditions and apply to new asset classes and markets.
Out of Sample Period
To reduce the chance of overfitting, organization managers can enforce all backtests must end a certain number of months before the current date. For example, if you set a one year out-of-sample period, the researchers on your team will not be able to use the most recent year of data in their backtests. A out-of-sample period is helpful because it leaves you a period to test your model after your done the development stage. Follow these steps to change the backtest out-of-sample period:
- Open the organization homepage.
- Scroll down to the Backtesting Out of Sample Period section.
- Adjust the out-of-sample period duration or click on "No Holdout Period".