CKI-002: DataWarehouse Authorization System
The public DataWarehouse instance will allow public and internal data to exist on the same platform. In order to achieve this we need to develop an authorization framework that allows us to control the data access keeping the internal information safe.
Furthermore, adding tree level authorization checks will allow us to group the data into more than 2 (public/internal) groups, enabling giving certain maintainers write permissions over their trees, keeping them public readable without granting access over all the internal trees.
Limit data access
The goal of this system is to limit the data users -both logged in and anonymous visitors- can see and modify.
Anonymous user can read public trees
Public trees need to be readable by all users, logged in or not.
User with authorization on public trees
Certain users will need authorization to modify public readable trees, including the submission of data and triaging failures.
User with read authorization on non-public trees
Some trees should only be accessible by logged in users with certain authorization.
User with write authorization on non-public trees
Certain users will need authorization to modify non-public trees, including the submission of data and triaging failures.
Users should only be able to see issues related to the revisions they have access.
New git trees are not readable/writeable by default
Trees without a certain policy attached need to be private by default. This means that manual action will be required to assign the correct policy to the new trees.
AuthorizationBackend system includes a new
Policy class that can
be attached to any other class and defines the authorization needed to
access it and it’s related objects.
Policy contains the following fields:
write_group. The field
name defines a human friendly description of
the policy, while
write_group indicates the group a user
needs to belong to be able to perform read and write actions on the objects
tagged with this policy, respectively.
To meet a Policy requirement, the user needs to be part of a group. For this, the built-in Django Authorization Framework is used.
For instance, a group named
auth-redhat-internal could give access to the
Red Hat internal data, or
auth-arm could be the group maintainers need to
belong in order to triage
arm tree objects.
read_group or write_group pointing to a group
The user making a request needs to belong to the group referred in this field in order to read or write the object related to the policy, respectively.
read_group or write_group null
write_group value is null it means that the
policy allows anyone to read or write the object related to that policy.
When a object is ‘writeable by anyone’ means that anyone logged in with the sufficient table-level permissions can write to it.
No policy attached
When an object has no policy attached it means that is not readable nor writeable by anyone. This guarantees objects to be private by default.
Git Tree Policies
GitTree objects get a
Policy object attached, allowing us to filter any
of the related objects (
Pipeline, etc) according to this policy.
This covers all the results endpoints, giving us control over the KCIDB and pipeline data available on the DataWarehouse.
For example, endpoints querying KCIDB Revisions will only be able to list
the revisions allowed for the user, while requesting via GET a particular
unauthorized Revision would return a
Not Found error.
Issue objects get a
Policy attached, allowing us to limit the issues a
user can see while querying them or querying or triaging a certain KCIDB
auth-issue-redhat-internal policy could limit an
a certain group of users, while
auth-issue-public could make an
Use case example
To illustrate the use of this design, let’s assume the simplest case: there is a single instance containing 2 sets of data: internal and public.
In this case, we want to keep the internal data only readable to a specific group of users, while the public data should be readable by anyone. Both sets can only be written by a certain group, too.
The policies would be the following:
- internal (
write_group: redhat-cki, read_group: redhat-all
- public (
write_group: redhat-cki, read_group: None
redhat-cki would contain all the members of CKI Team, giving them write
permissions to write on both the internal and public group of trees.
redhat-all has access to read the internal trees and should include all the
people allowed to read that data.
Having the possibility to create various groups with differentiated read and write checks should allow us to define rules that cover all the use cases.
A generic system attachable to any class provides flexibility to add new classes with access authorization rules without the need to design new checks.
Secure by default
Making sure that an object without a policy attached is private by default we make sure that anything missing to configure is kept internal.
Visibility definition becomes complex
The question of whether an object is ‘private’ or ‘public’ changes from evaluating a boolean flag to a query where the user accessing the data is involved.
While being a much more flexible approach, the complexity is increased and the result of the evaluation obscured.
To help visualize the policies and authorizations, adding details about which users and groups are able to read and write each object to the user interface would be a good feature. An audit page should list the policies attached to each object, as well as the details of the policy and the users fulfilling it.
Using the same
Policy model for both
Issue classes means
that the objects available for each of the cases are shared. For the first
case, different group policies might be desired while for
‘public’ and ‘internal’ categories could be useful.
This means that naming policies will need to be defined and filters put in place for the UI menus, such as:
To improve the performance and reduce the overhead of this authorization
queries, the authorized
Issues will be cached on the user
session. In other words, the authorizations are calculated and stored when
the user logs in, and need a log-out log-in cycle to refresh them.
It might be necessary to implement a cache invalidation system to update a users' authorizations when the it is added to or removed from a group.
Message queues are privileged
DataWarehouse uses AMQP queues to communicate data to micro services running from it.
With the current architecture, where all the available data is dumped into exchanges and queues and the decision of ignoring a message is handed over to the consumer, this communication channel and all it’s listeners need to be considered privileged.
To control the policies a consumer is attached to, it’s necessary to change the messages architecture.
One possible scenario would mean removing all the data from the messages and keep only the id of the object, making the consumer request additional data through the API with it’s credentials, which would go through the authorization checks, but loosing the performance benefits of rendering and broadcasting a single message.