In order to provide data to a data consumer, a data product needs to:-
Zhamak’s recommendation is to split a data product into 3 sections:-
The reasoning is that the Input Data Port should be able to receive data in the form and structure that it is saved at source. The abstraction layer converts data as it’s stored at source into a more standardised form e.g. Source data could be in a variety of formats (JSON, CSV, avro, parquet, yaml) whilst the abstraction layer could save these in relational format The output data port then formats the abstracted dataset as desired e.g. json, csv, html
In order to facilitate the buildout of the data product, I’ve found the following data stores to be required:-
This should at the very least allow authorisation of endpoint paths to specific users/roles to be achieved. If desired more fine grained control of returned data can be provided
This should contain tables to hold:-
This should be where the datasets that are stored at the input data port, abstraction layer and target data port are stored.
The data product should be able to capture information about how the data product has been used e.g. * metrics on number of calls to specific endpoints
There is no specific technology that needs to be used for a data store but my recommendation would be that:-
Examples include:-