Datasets
Format and Storage
Default Location
When you create an organization workspace in an empty directory, the CLI downloads the latest data directory from the LEAN repository. This directory contains a standard directory structure from which the LEAN engine reads. Once downloaded, the data directory tree looks like this:
data ├── alternative/ ├── cfd/ ├── crypto/ ├── equity/ ├── forex/ ├── future/ ├── futureoption/ ├── index/ ├── indexoption/ ├── market-hours/ ├── option/ ├── symbol-properties/ └── readme.md
By default, the data directory contains a small amount of sample data for all asset types to demonstrate how data files must be formatted. Additionally, the data directory itself and most of its subdirectories contain readme.md files containing more documentation on the format of the data files of each asset type.
Change Location
You can configure the data directory to use in the data-folder
property in your Lean configuration file.
The path this property is set to is used as the data directory by all commands that run the LEAN engine locally.
By default, this property points to the data directory inside your organization workspace.
If this property is set to a relative path, it is resolved relative to the Lean configuration file's parent directory.
The data directory is the only local directory that is mounted into all Docker containers ran by the CLI, so it must contain all the local files you want to read from your algorithms.
You can get the path to this directory in your algorithm using the Globals.DataFolder
variable.
Data Updates
Every day, the LEAN CLI updates the exchange market hours (data / market-hours / market-hours-database.json) and the symbol properties database (data / symbol-properties / symbol-properties-database.csv).
To disable the updates, open the LEAN configuration file (lean.json) and set the file-database-last-update
value to a date in the future.
Other Data Sources
If you already have data of your own you can convert it to a LEAN-compatible format yourself.
In that case, we recommend that you read the readme.md files generated by the lean init
command in the data directory, as these files contain up-to-date documentation on the expected format of the data files.
For development purposes, it is also possible to generate data using the CLI. This generator uses a Brownian motion model to generate realistic market data, which might be helpful when you're testing strategies locally but don't have access to real market data.