Data as a product
Approaching Data as a Product is helping us in Quantlane, by guiding the crucial steps when identifying and addressing needs and data requirements, making the role of data engineer more collaborative. It leads to the creation of a better data structure.
First, let us explain what we mean under the phrase Data as a Product. In the data team, we handle data of different nature and purposes: raw data, aggregated data, and data validations. Every data team is in fact building and supporting a Data Product. Our product covers data, pipelines, and tools used to generate, access, or maintain that data, within an organization. Moreover, we focus on data quality, we are producing refined data as well as creating multiple validations detecting suspicious or missing data. Data validations result in real time alerting. And who are the data consumers? Our end users are other teams of analysts or traders, their applications and algorithmic trading strategies, as well as the data team itself. Even for a smaller company like Quantlane, it is beneficial to take this approach and implement the product development “customer-provider” model into the internal communication between teams.
It’s all about the perspective
To approach your data with a product development mindset is a very conscious way of looking at the data. We recognise that different teams look at the data from their perspectives. For example, inside our data team, we often consider data as a pipeline whilst the quant team regards data as a fuel for its algorithms and traders see data as a useful insight. The field of product development offers a very useful perspective that other teams can easily borrow and we can all approach data as a product. This perspective is a common ground for communication, increasing the level of understanding. The data team acts as a provider while the requesting team is in the role of customer. Such an approach offers a useful brainstorming tool, it helps us to ask the right questions, avoid communication fallacy, and structure our thinking. The “customer” team is only going to adopt a data solution if it solves their own unique problems and meets their needs. It helps us to set up a process that simplifies and guarantees finding the optimal implementation. Both the particular “client” team and “providing” data team benefit.
Data product development
Like with any other product development process, our data product development starts with the request to the data team. Once the request is prioritized we try to do our best to meet the needs.
The foundation stone of a data product is a Data Product Documentation. Here we characterize properties and features of the data as you would specify the properties and parameters of a product. Shape, size, quality, storage, time of delivery, etc. It includes all the features that are important to discuss for the product to meet the needs. A high-quality data product definition can maximize data functional potential, making all processes less vague.
Our typical example of data product can be daily trade candles for the past 7 days, that must contain the close price, additionally including open, high, low price and daily traded volume, covering US market, available every morning at 6:00am in a specific table in our database, whose existence (availability) should be validated on a daily basis.
Another equally important step in the development process is brainstorming the data use cases. We try to predefine all the possible edge cases. This step is beneficial mainly for aggregated data. Here we agree on what the data solution should look like for a particular scenario. The use cases serve as test cases for unit testing later during the implementation.
Continuing with the example of daily trade candles data product a basic test case would be missing close price, which is a mandatory part of the product, hence in such a case it is not a valid product and we won’t deliver any data. We would also analyze scenarios of duplicated trade candles but with differing values and agree on the convenient decision algorithm.
Usually it requires more than one round of reviews, meetings, commenting and discussions in between the teams before the data product documentation is finalized. Only after we have formulated all the needs should we determine which datasets, techniques, etc. we will use to develop effective solutions.
The data product implementation proceeds from the data product documentation. Usually we define more than one step implementation, following agile iterative development. Because of the proper definition at the beginning of the process we can self-organize the implementation of the product and limit the necessity of discussion or intervention from the “client” team later on to an effective minimum. We are designing the solution, whether it is an app or a pipeline, with vision for future and maintenance in mind. We make updating and versioning of the data products clean and clear by keeping the documentation up to date.
For a high-level example, implementation of the mentioned daily trade candles data product would consist of necessary data processing elements serialized in the pipeline determined by its properties and the final outcome - the previously defined daily trade candles. Low-level implementation of particular data processing elements is up to a data engineer. The scheduling and orchestration of the workflow is set up, such that the trade candles data are ready and available in the database at 6:00 in the morning for further usage.
Benefits of the approach
So once more, let us stress and sum up the whole bunch of benefits of approaching the data as a product within the whole company. Essentially, it brings up what is important: it puts high level focus on the information contained in data and does not suggest a specific implementation prematurely. This mindset leads to proper communication clarifying what is and what is not necessary. Data Product Documentation creates a graspable structure for an optimal but iterable, high quality, sustainable and trustworthy data solution. It helps us to prevent problems and supports the maintenance.
In our data team, we are encouraged to create data products on our own request, wherever they prove useful or justified. A nice example of such a product is 'validation metadata'. Validation metadata are the outcome of checking and validation of some data in our database containing the validation results and additional information useful for partial automation of data platform maintenance. An example of validation can be a check on the presence (existence) of daily trade candles for the last 7 days in our database. Our aim is to have a reliable, stable, maintainable and high quality data platform scalable for future growth. Such a platform inspires data-driven functional decision making.