
Apart from public data, most organisation’s data needs to be secured.
In terms of a data product, security will need to cover authentication and authorisation of a user.
This particular topic is one of the hardest parts of implementing a data product as each organisation has their own established ways of authenticating and authorising users/systems which in my experience can take weeks to establish.
Authentication means ensuring that a user is who they say they are. There are many methods of authenticating a user. These include:-
This is a simple username and password which are typically combined, base-64 encrypted and passed to a data product in the header of a http request. This type of authentication is the easiest to implement but also the least secure, due to base64 being relatively easy to hack.
Takes authentication to the next level by requiring a user/system to authenticate by more than 1 method e.g.
Token based authentication can involve:-
More complex flows e.g. OAUTH2, ensure that the registered application is who they say they are and not another application imitating it by going through a more complex authentication flow which involves swapping the registered token with an access token by sending the response back to the registered application’s callback url :-
For the PoC we have chosen to use basic authentication for demo purposes.
Within an organisation, we typically use Token based authentication. This will involve:-
The diagram shows how this set up works in practice:-

Authorisation means ensuring that an authenticated user is permitted to access the requested data. Authorisation is a loaded term. So needs to be split down further for clarity.
This type of authorisation just ensures that a particular user/role is permitted to access a particular data product. Dependent on the authorisation system used it may be able to authorise the data product in it’s entirety or specific endpoints. So, for example, it may be desirable to allow free access to the Discovery port/endpoint so that anybody can view the data product documentation, but restrict access to the output data ports to a particular set of users, and restrict access to the input data port and control data port to specific roles.
As a data product can provide more than 1 target dataset, it is also desirable to provide more fine grained access control at the dataset level.