Debugging a failing GitLab pipeline job

How to allow a kernel developer to investigate from the inside why a GitLab pipeline job failed


A GitLab pipeline job failed, but the error is not reproducible outside the pipeline.


  1. Add debugging code to the pipeline in a pipeline-definition MR that will dump the environment and sleep in the case of failure, e.g. add the following in pipeline/stages/build.yml:

    if ! rpmbuild --rebuild ...; then
      cki_echo_error "Build failed"
      cki_echo_notify "dumping environment to $environment_file"
      export > "$environment_file"
      cki_echo_notify "sleeping"
      sleep infinity
      exit 1
  2. From the kernel repository MR, get the ID for the failed pipeline in the pipeline repositories. Then, in the pipeline-definition MR, retrigger that pipeline with the debug code via the bot with something like

    @cki-ci-bot, please test [rhel/12345678]
  3. Wait until the pipeline hits the sleep. In the failing job log, determine the gitlab-runner and spawned EC2 machine from lines like

    Running with ...
    on ...-aws-internal-a-...
    Running on ... via
  4. Use the script from the deployment-all repository checkout to access the appropriate EC2 instance. Select if the pipeline runs in ...-aws-internal-a-... or if the pipeline runs in ...-aws-internal-b-....

  5. Log into the runner

    sudo docker-machine ssh
  6. The docker container can then be entered via

    sudo docker ps
    sudo docker exec -it container-id /bin/bash