CKI-005: Pipeline artifact storage

Consistent approach for pipeline artifact storage
Michael Hofmann – cki-project/documentation!105

Abstract

This document describes the different ways that access to pipeline artifacts is provided.

Motivation

Pipelines default to artifacts stored on the GitLab servers. The properties of these artifacts are sometimes in conflict with the requirements of the CKI project.

Current external artifact solutions in the pipeline

Currently, the pipeline supports different approaches to publish data outside of GitLab:

Approach Purpose Remarks
publish_elsewhere Access to artifacts for Beaker and QE limited to the publish stage
S3 artifact support GitLab artifacts on GitLab/S3/both requires credentials
KCIDB uploading Access to artifacts in Data Warehouse/KCIDB requires credentials
prepared software Access to a prebuild venv + pipeline scripts per architecture in prepare stage

Pipeline types

The following pipeline types can be distinguished and are discussed below:

Internal pipelines in private pipeline projects

These pipelines run in private pipeline projects, e.g. for MRs in private GitLab projects.

Internal RHEL5 pipelines in private pipeline projects

Additionally, these pipelines are limited by RHEL 5 which runs in Beaker.

Trusted pipelines in public pipeline projects

These pipelines run in public pipeline projects, e.g. for MRs from trusted contributors in public GitLab projects.

Untrusted pipelines in public projects

These pipelines run in public pipeline projects for MRs from untrusted contributors.

Artifact considerations

GitLab artifacts are characterized by:

  • in private projects, they are only accessible with credentials, e.g. it is not possible to host a DNF repository on them as required for Beaker and QE (auth)
  • artifacts have a maximum size of 1GiB (1GiB)
  • artifacts are only provided via HTTPS; in RHEL 5, the TLS support is too old (HTTP)
  • GitLab artifacts do not allow for the publishing of incremental test results from those pipelines; this is less of a problem when not a lot of tests are executed (incremental)
  • GitLab artifacts do not require credentials for upload (no creds); untrusted pipelines cannot upload to external artifact storage for that reason (as they have no credentials)

The table lists which pipeline types are affected by the different aspects described above:

Pipeline type auth 1GiB HTTP incremental no creds
internal yes yes no yes no
internal RHEL5 yes no yes yes no
trusted public no yes no yes no
untrusted public no yes no no yes

Approach

Two storage types will be provided with the following features:

storage protocol access incremental results
GitLab HTTPS only public only no
GitLab+S3 HTTP/HTTPS internal/public yes

This means to rework the artifact uploading in the pipeline to unify publish_elsewhere, S3 artifact storage support, KCIDB file uploading and software preparation.

Per pipeline, artifacts will be either stored only on GitLab, or on GitLab and S3 simultaneously.

If artifacts are also stored on S3, jobs with artifacts > 1GiB will be excluded from the GitLab artifacts on a case-by-case basis. For S3, a browseable index page will be provided, and a link will be shown in the end of the job output.

For the different pipelines, the following storage configurations will be used:

pipeline type storage
internal GitLab + internal S3/https
internal RHEL5 GitLab + internal S3/http
trusted public GitLab + public S3/https
untrusted public GitLab

Benefits

  • One unified way to store artifacts either on GitLab and/or S3 that fulfills all requirements

Drawbacks

  • We might find another requirement that will conflict with the proposed solution

Alternatives

  • Keep the different ways of uploading artifacts, and add a fourth one when we solve the gcov/debug kernel size problem
Last modified April 20, 2021: Storage RFC (6d6b722)