The CKI setup contains a partial staging environment. To reduce the effort needed for maintenance and maximize usefulness, components are only duplicated into a staging version if it provides benefits for stability and testing.
In general, the staging environment is thought of as a production-like:
- no additional debug options should be enabled
- features should only be disabled if they would otherwise interfere with the production environment
Staging versions of services running on Kubernetes/OpenShift are deployed into
separate staging namespaces (
preprod). As far as possible,
these services are configured identical in production and staging. Where
necessary, they can use
is_production_or_staging from cki-lib to differentiate between
Service monitoring is similar to the production namespaces:
- logs can be found in Loki
- monitoring stack federated into AppleCrumble
- all alerting rules also apply to staging services
- exceptions are logged in Sentry
The RabbitMQ cluster contains staging versions of the
/datawarehouse-celery virtual hosts in
Global resources that are visible across the environments are different (virtual host, user, password variable names), while resources deployed only within one virtual host keep their name to increase portability across the environments:
The following external message sources are available in the environments:
Depending on the pipeline type (
pipelines share most of their infrastructure with production pipelines. Only
some components are split into production and staging, with the retriggered
pipelines using the staging version.
|DC GitLab runner configurations||shared||shared||shared|
|AWS GitLab runner configurations||split||split||split|
|GitLab runner machines||split||shared||shared|
Infrastructure has been split where necessary to allow for the testing of launch template changes via retriggered pipelines.
DC GitLab runner configurations
Currently, all Docker-based GitLab runner configurations hosted on static
machines in the data center are shared between retriggered and production
pipelines. In practice, this means that e.g. the
staging-pipeline-test-runner tags are served by the same GitLab runner
These configurations could be split to e.g. allow experimentation with the
Docker configuration. This would require additional changes to the
gitlab-runner-config script in
deployment-all to allow separate
deployment of staging and production configurations.
AWS GitLab runner configurations
All docker-machine-based GitLab runner configurations hosted on AWS EC2
machines are split between retriggered and production pipelines. In practice,
this means that e.g. the
staging-pipeline-createrepo-runner tags are served by different GitLab runner
The properties of the workers launched by the docker-machine-based GitLab
runner configurations are determined by the associated launch templates. For
internal pipelines, separate launch templates are used for retriggered and
production pipelines. In practice, this means that e.g. the
pipeline-createrepo-runner tag will spawn workers based on the
arr-cki.prod.lt.internal-general-worker launch template, while the
staging-pipeline-test-runner tag will use the
arr-cki.staging.lt.internal-general-worker launch template.
The current setup allows to test changes to the launch templates by retriggering internal pipelines. Most of these changes should apply equally well to the other pipeline types. Nevertheless, the launch templates could also be split for the other pipeline types.
GitLab runner machines
The AWS EC2 machines hosting the docker-machine-based GitLab runner
configurations are only split for internal pipelines. In practice, this means
tags are handled by GitLab runners on different AWS EC2 machines.
The current setup allows to test changes to the EC2 machine setup by retriggering internal pipelines. Most of these changes should apply equally well to the machines for the other pipeline types.
Nevertheless, the AWS EC2 machines hosting the docker-machine-based GitLab
runners could also be split for the other pipeline types. For
this would require two additional service accounts for the VPN connections as
these cannot be shared across machines.
Currently, the same VPC subnets are used for the dynamically spawned workers of
retriggered and production pipelines. In practice, this means that e.g. the
result in workers that share the same VPC subnets.
The subnets could also be split to further separate the workers for production pipelines from the workers for retriggered pipelines. This would avoid interference e.g. in the case of subnets running out of IP addresses.
Currently, the same S3 buckets are used for retriggered and production
pipelines. In practice, this means that e.g. retriggered pipelines share its
ccache with the production pipelines.
The S3 buckets could also be split to further separate production
pipelines from retriggered pipelines. This might require bot or pipeline
changes to keep short pipelines (