dataproductpoc-docs

Data Product data stores

In order to provide data to a data consumer, a data product needs to:-

Zhamak’s recommendation is to split a data product into 3 sections:-

The reasoning is that the Input Data Port should be able to receive data in the form and structure that it is saved at source. The abstraction layer converts data as it’s stored at source into a more standardised form e.g. Source data could be in a variety of formats (JSON, CSV, avro, parquet, yaml) whilst the abstraction layer could save these in relational format The output data port then formats the abstracted dataset as desired e.g. json, csv, html

In order to facilitate the buildout of the data product, I’ve found the following data stores to be required:-

There is no specific technology that needs to be used for a data store but my recommendation would be that:-

  1. It is a cloud-based data store - so that the data product can easily use it.
  2. It is capable of storing data in a variety of formats, not just relational - this is important as web development doesn’t always rely on relational databases.

Examples include:-

  1. Data Lakes (e.g. Azure Data Lake)
  2. Cloud-based polyglot databases (e.g. Snowflake, BigQuery, PostgreSQL)