Staging environment
The CKI setup contains a partial staging environment. To reduce the effort needed for maintenance and maximize usefulness, components are only duplicated into a staging version if it provides benefits for stability and testing.
In general, the staging environment is thought of as a production-like:
- no additional debug options should be enabled
- features should only be disabled if they would otherwise interfere with the production environment
Kubernetes/OpenShift
Staging versions of services running on Kubernetes/OpenShift are deployed into
separate staging namespaces (cki-staging
or preprod
). As far as possible,
these services are configured identical in production and staging. Where
necessary, they can use is_production
, is_staging
or
is_production_or_staging
from cki-lib to differentiate between
production/staging/development environments.
Service monitoring is similar to the production namespaces:
- logs can be found in Loki
- monitoring stack federated into AppleCrumble
- all alerting rules also apply to staging services
- exceptions are logged in Sentry
RabbitMQ
The RabbitMQ cluster contains staging versions of the /
(a.k.a. cki
) and
/datawarehouse-celery
virtual hosts in /cki-staging
and
/datawarehouse-celery-staging
, respectively.
Global resources that are visible across the environments are different (virtual host, user, password variable names), while resources deployed only within one virtual host keep their name to increase portability across the environments:
resource | naming | production | staging |
---|---|---|---|
virtual host | different | -staging suffix |
|
users | different | cki. prefix |
cki-staging. prefix |
passwords | different | RABBITMQ_CKI prefix |
RABBITMQ_CKI_STAGING prefix |
exchanges | same | cki.exchange. prefix |
cki.exchange. prefix |
queues | same | cki.queue. prefix |
cki.queue. prefix |
The following external message sources are available in the environments:
message type | production | staging |
---|---|---|
gitlab |
yes | yes |
amqp-bridge (UMB) |
yes | yes |
amqp-bridge (fedmsg) |
yes | yes |
sentry |
yes | no |
jira |
yes | no |
Retriggered pipelines
Depending on the pipeline type (internal
, public
, ofa
), retriggered
pipelines share most of their infrastructure with production pipelines. Only
some components are split into production and staging, with the retriggered
pipelines using the staging version.
internal |
public |
ofa |
|
---|---|---|---|
DC GitLab runner configurations | shared | shared | shared |
AWS GitLab runner configurations | split | split | split |
launch templates | split | split | shared |
GitLab runner machines | split | split | shared |
VPC subnets | shared | shared | shared |
S3 buckets | shared | shared | shared |
Infrastructure has been split where necessary to allow for the testing of launch template changes via retriggered pipelines.
DC GitLab runner configurations
Currently, all Docker-based GitLab runner configurations hosted on static
machines in the data center are shared between retriggered and production
pipelines. In practice, this means that e.g. the pipeline-test-runner
and
staging-pipeline-test-runner
tags are served by the same GitLab runner
configuration.
These configurations could be split to e.g. allow experimentation with the
Docker configuration. This would require additional changes to the
gitlab-runner-config
script in deployment-all
to allow separate
deployment of staging and production configurations.
AWS GitLab runner configurations
All docker-machine-based GitLab runner configurations hosted on AWS EC2
machines are split between retriggered and production pipelines. In practice,
this means that e.g. the pipeline-createrepo-runner
and
staging-pipeline-createrepo-runner
tags are served by different GitLab runner
configurations.
Launch templates
The properties of the workers launched by the docker-machine-based GitLab
runner configurations are determined by the associated launch templates. For
internal pipelines, separate launch templates are used for retriggered and
production pipelines. In practice, this means that e.g. the
pipeline-createrepo-runner
tag will spawn workers based on the
arr-cki.prod.lt.internal-general-worker
launch template, while the
staging-pipeline-test-runner
tag will use the
arr-cki.staging.lt.internal-general-worker
launch template.
The current setup allows to test changes to the launch templates by retriggering internal pipelines. Most of these changes should apply equally well to the other pipeline types. Nevertheless, the launch templates could also be split for the other pipeline types.
GitLab runner machines
The AWS EC2 machines hosting the docker-machine-based GitLab runner
configurations are only split for internal pipelines. In practice, this means
that the pipeline-createrepo-runner
and staging-pipeline-createrepo-runner
tags are handled by GitLab runners on different AWS EC2 machines.
The current setup allows to test changes to the EC2 machine setup by retriggering internal pipelines. Most of these changes should apply equally well to the machines for the other pipeline types.
Nevertheless, the AWS EC2 machines hosting the docker-machine-based GitLab
runners could also be split for the other pipeline types. For ofa
pipelines,
this would require two additional service accounts for the VPN connections as
these cannot be shared across machines.
VPC subnets
Currently, the same VPC subnets are used for the dynamically spawned workers of
retriggered and production pipelines. In practice, this means that e.g. the
pipeline-createrepo-runner
and staging-pipeline-createrepo-runner
tags
result in workers that share the same VPC subnets.
The subnets could also be split to further separate the workers for production pipelines from the workers for retriggered pipelines. This would avoid interference e.g. in the case of subnets running out of IP addresses.
S3 buckets
Currently, the same S3 buckets are used for retriggered and production
pipelines. In practice, this means that e.g. retriggered pipelines share its
ccache
with the production pipelines.
The S3 buckets could also be split to further separate production
pipelines from retriggered pipelines. This might require bot or pipeline
changes to keep short pipelines (tests_only=true
) working.