Debugging a failing GitLab pipeline job

How to allow a kernel developer to investigate from the inside why a GitLab pipeline job failed


A GitLab pipeline job failed, but the error is not reproducible outside the pipeline.


  1. Add debugging code to the pipeline in a pipeline-definition MR that will dump the environment and sleep in the case of failure, e.g. add the following in pipeline/stages/build.yml:

    if ! rpmbuild --rebuild ...; then
      echo_red "Build failed"
      echo_yellow "  dumping environment to $environment_file"
      export > "$environment_file"
      echo_yellow "  sleeping"
      sleep infinity
      exit 1
  2. From the kernel repository MR, get the ID for the failed pipeline in the pipeline repositories. Then, in the pipeline-definition MR, retrigger that pipeline with the debug code via the bot with something like

    @cki-ci-bot, please test [baseline/12345678]
  3. Wait until the pipeline hits the sleep. In the failing job log, determine the gitlab-runner and spawned EC2 machine from lines like

    Running with ...
    on ...-aws-internal-a-...
    Running on ... via
  4. With credentials from the deployment repository, log into the spawned EC2 machine, add the SSH key of the developer and determine the IP address via

    cki_secret AWS_INTERNAL_RUNNER_SSH_PRIVATE_KEY | ssh-add -
    cki_variable AWS_INTERNAL_RUNNER_SSH_PUBLIC_KEY > /tmp/internal-runner.pubkey
    ssh -i /tmp/internal-runner.pubkey root@aws-internal-a-gitlab-runner-host-name
    sudo docker-machine ssh
    curl | sudo tee -a /root/.ssh/authorized_keys
    hostname -I
  5. The docker container can then be entered via

    ssh -i developer-account.key root@ip-address-from-above
    sudo docker ps
    sudo docker exec -it container-id /bin/bash
Last modified August 1, 2022: Add Team talks index (83e5e3a)