Fixing GitLab job system failures

How to investigate GitLab CI/CD job runner system failures

Problem

You get a problem of a CKI GitLab CI/CD job failing like

GitLab CI/CD job log with runner system failure

Steps

  1. Determine the gitlab-runner responsible for the job. This can be derived from the gitlab-runner name in the job output. In the screenshot above, the wf-aws-aws-internal-b-dm-internal-build refers to the internal runner in AZ b.

  2. Log into the gitlab-runner machine via ansible_ssh.sh.

  3. Look at the output of the journal for the gitlab-runner via

    journalctl --since today --all --unit gitlab-runner
    

    Get started by looking for ERROR and red lines in the output.