CKI-005: Pipeline artifact storage
Abstract
This document describes the different ways that access to pipeline artifacts is provided.
Motivation
Pipelines default to artifacts stored on the GitLab servers. The properties of these artifacts are sometimes in conflict with the requirements of the CKI project.
Current external artifact solutions in the pipeline
Currently, the pipeline supports different approaches to publish data outside of GitLab:
Approach | Purpose | Remarks |
---|---|---|
publish_elsewhere |
Access to artifacts for Beaker and QE | limited to the publish stage |
S3 artifact support | GitLab artifacts on GitLab/S3/both | requires credentials |
KCIDB uploading | Access to artifacts in Data Warehouse/KCIDB | requires credentials |
prepared software | Access to a prebuild venv + pipeline scripts | per architecture in prepare stage |
Pipeline types
The following pipeline types can be distinguished and are discussed below:
Internal pipelines in private pipeline projects
These pipelines run in private pipeline projects, e.g. for MRs in private GitLab projects.
Internal RHEL5 pipelines in private pipeline projects
Additionally, these pipelines are limited by RHEL 5 which runs in Beaker.
Trusted pipelines in public pipeline projects
These pipelines run in public pipeline projects, e.g. for MRs from trusted contributors in public GitLab projects.
Untrusted pipelines in public projects
These pipelines run in public pipeline projects for MRs from untrusted contributors.
Artifact considerations
GitLab artifacts are characterized by:
- in private projects, they are only accessible with credentials, e.g. it is
not possible to host a DNF repository on them as required for Beaker and QE
(
auth
) - artifacts have a maximum size of 1GiB (
1GiB
) - artifacts are only provided via HTTPS; in RHEL 5, the TLS support is too old
(
HTTP
) - GitLab artifacts do not allow for the publishing of incremental test results
from those pipelines; this is less of a problem when not a lot of tests are
executed (
incremental
) - GitLab artifacts do not require credentials for upload (
no creds
); untrusted pipelines cannot upload to external artifact storage for that reason (as they have no credentials)
The table lists which pipeline types are affected by the different aspects described above:
Pipeline type | auth |
1GiB |
HTTP |
incremental |
no creds |
---|---|---|---|---|---|
internal | yes | yes | no | yes | no |
internal RHEL5 | yes | no | yes | yes | no |
trusted public | no | yes | no | yes | no |
untrusted public | no | yes | no | no | yes |
Approach
Two storage types will be provided with the following features:
storage | protocol | access | incremental results |
---|---|---|---|
GitLab | HTTPS only | public only | no |
GitLab+S3 | HTTP/HTTPS | internal/public | yes |
This means to rework the artifact uploading in the pipeline to unify
publish_elsewhere
, S3 artifact storage support, KCIDB
file uploading and
software preparation.
Per pipeline, artifacts will be either stored only on GitLab, or on GitLab and S3 simultaneously.
If artifacts are also stored on S3, jobs with artifacts > 1GiB will be excluded from the GitLab artifacts on a case-by-case basis. For S3, a browseable index page will be provided, and a link will be shown in the end of the job output.
For the different pipelines, the following storage configurations will be used:
pipeline type | storage |
---|---|
internal | GitLab + internal S3/https |
internal RHEL5 | GitLab + internal S3/http |
trusted public | GitLab + public S3/https |
untrusted public | GitLab |
Benefits
- One unified way to store artifacts either on GitLab and/or S3 that fulfills all requirements
Drawbacks
- We might find another requirement that will conflict with the proposed solution
Alternatives
- Keep the different ways of uploading artifacts, and add a fourth one when we solve the gcov/debug kernel size problem