CKI-002: DataWarehouse Authorization System
Abstract
The public DataWarehouse instance will allow public and internal data to exist on the same platform. In order to achieve this we need to develop an authorization framework that allows us to control the data access keeping the internal information safe.
Furthermore, adding tree level authorization checks will allow us to group the data into more than 2 (public/internal) groups, enabling giving certain maintainers write permissions over their trees, keeping them public readable without granting access over all the internal trees.
Motivation
Goals
Limit data access
The goal of this system is to limit the data users -both logged in and anonymous visitors- can see and modify.
User Stories
Anonymous user can read public trees
Public trees need to be readable by all users, logged in or not.
User with authorization on public trees
Certain users will need authorization to modify public readable trees, including the submission of data and triaging failures.
User with read authorization on non-public trees
Some trees should only be accessible by logged in users with certain authorization.
User with write authorization on non-public trees
Certain users will need authorization to modify non-public trees, including the submission of data and triaging failures.
Listing Issues
Users should only be able to see issues related to the revisions they have access.
New git trees are not readable/writeable by default
Trees without a certain policy attached need to be private by default. This means that manual action will be required to assign the correct policy to the new trees.
Approach
Policies
The AuthorizationBackend
system includes a new Policy
class that can
be attached to any other class and defines the authorization needed to
access it and it’s related objects.
Policy
contains the following fields: name
, read_group
and
write_group
. The field name
defines a human friendly description of
the policy, while read_group
and write_group
indicates the group a user
needs to belong to be able to perform read and write actions on the objects
tagged with this policy, respectively.
To meet a Policy requirement, the user needs to be part of a group. For this, the built-in Django Authorization Framework is used.
For instance, a group named auth-redhat-internal
could give access to the
Red Hat internal data, or auth-arm
could be the group maintainers need to
belong in order to triage arm
tree objects.
read_group or write_group pointing to a group
The user making a request needs to belong to the group referred in this field in order to read or write the object related to the policy, respectively.
read_group or write_group null
If the read_group
or write_group
value is null it means that the
policy allows anyone to read or write the object related to that policy.
When a object is ‘writeable by anyone’ means that anyone logged in with the sufficient table-level permissions can write to it.
No policy attached
When an object has no policy attached it means that is not readable nor writeable by anyone. This guarantees objects to be private by default.
Git Tree Policies
GitTree
objects get a Policy
object attached, allowing us to filter any
of the related objects (KCIDBRevision
, KCIDBBuild
, KCIDBTest
,
Pipeline
, etc) according to this policy.
This covers all the results endpoints, giving us control over the KCIDB and pipeline data available on the DataWarehouse.
For example, endpoints querying KCIDB Revisions will only be able to list
the revisions allowed for the user, while requesting via GET a particular
unauthorized Revision would return a Not Found
error.
Issues Policies
Issue
objects get a Policy
attached, allowing us to limit the issues a
user can see while querying them or querying or triaging a certain KCIDB
object.
For instance, auth-issue-redhat-internal
policy could limit an Issue
to
a certain group of users, while auth-issue-public
could make an Issue
public.
Use case example
To illustrate the use of this design, let’s assume the simplest case: there is a single instance containing 2 sets of data: internal and public.
In this case, we want to keep the internal data only readable to a specific group of users, while the public data should be readable by anyone. Both sets can only be written by a certain group, too.
The policies would be the following:
- internal (
auth-internal
):write_group: redhat-cki, read_group: redhat-all
- public (
auth-public
):write_group: redhat-cki, read_group: None
redhat-cki
would contain all the members of CKI Team, giving them write
permissions to write on both the internal and public group of trees.
redhat-all
has access to read the internal trees and should include all the
people allowed to read that data.
Benefits
Flexibility
Having the possibility to create various groups with differentiated read and write checks should allow us to define rules that cover all the use cases.
Reusability
A generic system attachable to any class provides flexibility to add new classes with access authorization rules without the need to design new checks.
Secure by default
Making sure that an object without a policy attached is private by default we make sure that anything missing to configure is kept internal.
Drawbacks
Visibility definition becomes complex
The question of whether an object is ‘private’ or ‘public’ changes from evaluating a boolean flag to a query where the user accessing the data is involved.
While being a much more flexible approach, the complexity is increased and the result of the evaluation obscured.
To help visualize the policies and authorizations, adding details about which users and groups are able to read and write each object to the user interface would be a good feature. An audit page should list the policies attached to each object, as well as the details of the policy and the users fulfilling it.
Mixed policies
Using the same Policy
model for both GitTree
and Issue
classes means
that the objects available for each of the cases are shared. For the first
case, different group policies might be desired while for Issues
only
‘public’ and ‘internal’ categories could be useful.
This means that naming policies will need to be defined and filters put in place for the UI menus, such as:
auth-issue-
prefix forIssue
policies.auth-gittree-
prefix forGitTree
policies.
Session caching
To improve the performance and reduce the overhead of this authorization
queries, the authorized GitTree
s and Issue
s will be cached on the user
session. In other words, the authorizations are calculated and stored when
the user logs in, and need a log-out log-in cycle to refresh them.
It might be necessary to implement a cache invalidation system to update a users’ authorizations when the it is added to or removed from a group.
Message queues are privileged
DataWarehouse uses AMQP queues to communicate data to micro services running from it.
With the current architecture, where all the available data is dumped into exchanges and queues and the decision of ignoring a message is handed over to the consumer, this communication channel and all it’s listeners need to be considered privileged.
To control the policies a consumer is attached to, it’s necessary to change the messages architecture.
One possible scenario would mean removing all the data from the messages and keep only the id of the object, making the consumer request additional data through the API with it’s credentials, which would go through the authorization checks, but losing the performance benefits of rendering and broadcasting a single message.