Debugging a failing GitLab pipeline job

How to allow a kernel developer to investigate from the inside why a GitLab pipeline job failed

Problem

A GitLab pipeline job failed, but the error is not reproducible outside the pipeline.

Steps

  1. Add debugging code to the pipeline in a pipeline-definition MR that will dump the environment and sleep in the case of failure, e.g. add the following in pipeline/stages/build.yml:

    if ! rpmbuild --rebuild ...; then
      environment_file=$(mktemp)
      echo_red "Build failed"
      echo_yellow "  dumping environment to $environment_file"
      export > "$environment_file"
      echo_yellow "  sleeping"
      sleep infinity
      exit 1
    fi
    
  2. From the kernel repository MR, get the ID for the failed pipeline in the pipeline repositories. Then, in the pipeline-definition MR, retrigger that pipeline with the debug code via the bot with something like

    @cki-ci-bot, please test [baseline/12345678]
    
  3. Wait until the pipeline hits the sleep. In the failing job log, determine the gitlab-runner and spawned EC2 machine from lines like

    Running with ...
    on ...-aws-internal-a-...
    ...
    Running on ... via runner-abcdef-arr-cki.prod.general.1234567-123abcdef...
    
  4. With credentials from the deployment repository, log into the spawned EC2 machine, add the SSH key of the developer and determine the IP address via

    cki_secret AWS_INTERNAL_RUNNER_SSH_PRIVATE_KEY | ssh-add -
    cki_variable AWS_INTERNAL_RUNNER_SSH_PUBLIC_KEY > /tmp/internal-runner.pubkey
    ssh -i /tmp/internal-runner.pubkey root@aws-internal-a-gitlab-runner-host-name
    sudo docker-machine ssh runner-abcdef-arr-cki.prod.general.1234567-123abcdef
    curl https://gitlab.com/developer-account.keys | sudo tee -a /root/.ssh/authorized_keys
    hostname -I
    
  5. The docker container can then be entered via

    ssh -i developer-account.key root@ip-address-from-above
    sudo docker ps
    sudo docker exec -it container-id /bin/bash
    
Last modified November 2, 2021: Document debugging a pipeline job (565f9b3)