book
Checkout our new book! Hands on AI Trading with Python, QuantConnect, and AWS Learn More arrow

Securities

Filtering Data

Introduction

Unfiltered raw data can be faulty for a number of reasons, including invalid data entry. Moreover, high-frequency traders can deploy bait-and-switch strategies by submitting bait orders to deceive other market participants, making raw data noisy and untradeable. To avoid messing up with our trading logic and model training, you can filter out suspicious raw data with a data filter.

Set Models

To set a data filter for a security, call the set_data_filter property on the Security object.

Select Language:
# Use the set_data_filter method to use the SecurityDataFilter on the SPY ETF data.
spy = self.add_equity("SPY")
spy.set_data_filter(SecurityDataFilter())

You can also set the data filter model in a security initializer. If your algorithm has a universe, use the security initializer technique. In order to initialize single security subscriptions with the security initializer, call set_security_initializer before you create the subscriptions.

Select Language:
class BrokerageModelExampleAlgorithm(QCAlgorithm):
    def initialize(self) -> None:
        # In the Initialize method, set the security initializer to seed initial the prices and models of assets.
        self.set_security_initializer(MySecurityInitializer(self.brokerage_model, FuncSecuritySeeder(self.get_last_known_prices)))

# Outside of the algorithm class
class MySecurityInitializer(BrokerageModelSecurityInitializer):

    def __init__(self, brokerage_model: IBrokerageModel, security_seeder: ISecuritySeeder) -> None:
        super().__init__(brokerage_model, security_seeder)    
    def initialize(self, security: Security) -> None:
        # First, call the superclass definition.
        # This method sets the reality models of each security using the default reality models of the brokerage model.
        super().initialize(security)

        # Next, overwrite some of the reality models
        security.set_data_filter(SecurityDataFilter())

Default Behavior

The following table shows the default data filter for each security type:

Security TypeDefault Filter
EquityEquityDataFilter
OptionOptionDataFilter
ForexForexDataFilter
IndexIndexDataFilter
CfdCfdDataFilter
OthersSecurityDataFilter

None of the preceding filters filter out any data.

Model Structure

Data filtering models must implement a filter method, which receives Security and BaseData objects and then returns a boolean object that represents if the data point should be filtered out.

Select Language:
# Include or exclude a data point from the algorithm.
class MyDataFilter(SecurityDataFilter):
    def filter(self, vehicle: Security, data: BaseData) -> bool:
        return True

Examples

The following examples demonstrate some common practices for filtering data.

Example 1: Filter Out Outliers

When analyzing high-frequency price data, it's important to filter out potential outliers and anomalies that may skew the analysis. One effective method is to use simple moving average (SMA) and standard deviation indicators to identify ticks that significantly deviate from the short-term trend. By comparing each tick to the indicator values, you can flag any data points that fall outside a threshold (for example, three standard deviations). This filtration process removes suspicious or erroneous price information from entering your algorithm, ensuring a cleaner dataset for trading.

Select Language:
class CustomDataFilterAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        equity = self.add_equity("AAPL", Resolution.TICK)
        # Create the indicators.
        equity.sma = self.sma(equity.symbol, 100)
        equity.std = IndicatorExtensions.of(StandardDeviation(100), equity.sma, True)
        # Set the data filter.
        equity.set_data_filter(CustomDataFilter())


class CustomDataFilter(SecurityDataFilter):

    def filter(self, vehicle: Security, data: BaseData) -> bool:
        # Wait until the indicators are ready.
        security = vehicle
        if not (security.sma.is_ready and security.std.is_ready):
            return True # Keep the data point.
        # Check if the current value is within 3 standard deviations of the mean.
        # Return True (keep) or False (discard).
        sma = security.sma.current.value
        std = security.std.current.value
        return sma - 3*std <= data.value <= sma + 3*std

Example 2: Filter Out Major Exchanges

When you trade illiquid financial instruments, it can be advantageous to focus on the BATS exchange since its quote data may not fully reflect the fair market value. Due to the lower trading volume and visibility of BATS, the quotes there may lag behind the true value of illiquid assets. A carefully designed algorithm can analyze the BATS feed to identify situations where the quotes appears to be undervalued compared to the asset's intrinsic worth. By executing trades to capture this disconnect, rather than arbitraging between exchanges, you may be able to profit from the market inefficiencies present in the less liquid instrument. The following example demonstrates how to only consume data from the BATS exchange:

Select Language:
class BatsDataFilterAlgorithm(QCAlgorithm):

    def initialize(self) -> None:
        equity = self.add_equity("AAPL", Resolution.TICK)
        # Set the data filter.
        equity.set_data_filter(BatsDataFilter())


class BatsDataFilter(SecurityDataFilter):

    def filter(self, vehicle: Security, data: BaseData) -> bool:
        # Get the tick object.
        tick = Tick(data)
        # Return True (keep) or False (discard).
        return tick and tick.exchange == Exchange.BATS.name
        

Other Examples

For more examples, see the following algorithms:

You can also see our Videos. You can also get in touch with us via Discord.

Did you find this page helpful?

Contribute to the documentation: