At Quantlane we have both human trading teams and algorithmic trading teams. Both require historical data so they can analyze market structure, analyze how their trading strategies behave, and improve.
We have two types of data: (anonymized) market data, and internal trading data.
The most importart types of market data for us are:
- market-by-order data (also L3, or Level 3 data) tracking every order book change;
- trades (also 'ticks'), a subset of L3 only describing the transactions that actually matched and happened;
- trade candles (also OHLC) aggregating open-high-low-close prices in predefined time intervals.
Then there is our internal trading data, such as our orders and their lifecycle (when they were created, filled, canceled etc.), number of shares held in each financial instrument, daily statistics about our trading and others.
All of the data available within Quantlane is normalized to our proprietary formats. We use Avro schemata (described in YAML files) for converting data structures from Python to binary data and back. This is heavily used for distributing data via our messaging system – see our earlier article The Messaging Heart of our Platform.
Quantlane uses several data vendors for receiving market data. These vendors are either the exchanges themselves offering raw market data (usually via multicast UDP streams), or third-party vendors which offer data via streams or request-response API calls.
Each data vendor has their own specific protocol for requesting/publishing data. For example, Nasdaq ITCH is used by several exchanges in the Nasdaq group, although versions and message payloads are not consistent across the group.
For raw market data we have an application called Midas (comes from (M)arket (D)ata (S)erver). This is a modular application that receives raw market data and normalizes it to our proprietary format called 'The One'. Client applications inside Quantlane then connect to Midas and subscribe to real-time data in a standard format. Midas also extracts trades from raw market data and publishes them into our messaging system, where they can be read similarly as if you connected to Midas directly.
Third-party vendors offer APIs (often HTTP) that return data in a standardized format (mostly JSON or CSV). For each third-party vendor we create a separate microservice which is able to process the data. Usually this application is a straightforward ETL (Extract, Transform, Load) job: data is extracted from the API, transformed / normalized into our internal format, and loaded into a database or published to messagging.
Finally, there is internal trading data, which is mostly generated from our interaction with execution channels (endpoints for entering orders on exchanges). This data is essentially a record of our requests (order creation, cancellation, modification) and the responses we receive (confirmations, rejections, order fill notifications).
All the data mentioned so far is just raw data we receive or generate. In addition to distributing and storing raw data, we also run applications that generate data derivatives from raw data. These are various kinds of transformations and aggregations. For example, the position (number of shares) we hold in a particular financial instrument is calculated as the net sum of order fills (trades where we bought and sold the instrument). Another example are daily OHLC candles: daily open, high, low and close prices. These applications are written in such a way that their input is either the messaging system or a database and their output is either messaging again, or the Web connector described in the following section.
Data ingestion: Database Connector
For ingesting data into our database we use an application called the connector. This application accepts data from different data sources in our internal normalized format, migrates the data to different versions (both older and newer if possible) and stores them in a database.
Currently there are three types of connectors:
- Midas connector processes and stores normalized market data directly from Midas.
- Messaging connector processes and stores all data published in the messaging system.
- Web connector accepts and stores normalized data that our systems push to it with HTTP requests.
Data storage: Database
As you might have noticed, we have a wide variety of data and almost all of it pertains to time. This means that we need to store time series, where each data message has a time it relates to. That's why we have elected to use PostgreSQL with the TimescaleDB extension.
What TimescaleDB does is sharding: it creates child tables where each child table stores only a part (a shard) of the data, sliced by time range (e.g the first child table stores the first week of year 2021, the second child table stores the second week and so on). This approach allows queries that use a time filter to be fast even on large datasets, as queries can be routed to specific child tables, and parallelized.
The structure of tables in the database is mapped to the schemata we use in our messaging, so the data is almost the same and you can use data from the database in almost the same fashion as if it came from the messaging system.