Debugging a failing GitLab pipeline job
Problem
A GitLab pipeline job failed, but the error is not reproducible outside the pipeline.
Steps
-
Add debugging code to the pipeline in a
pipeline-definition
MR that will dump the environment and sleep in the case of failure, e.g. add the following in pipeline/stages/build.yml:if ! rpmbuild --rebuild ...; then environment_file=$(mktemp) echo_red "Build failed" echo_yellow " dumping environment to $environment_file" export > "$environment_file" echo_yellow " sleeping" sleep infinity exit 1 fi
-
From the kernel repository MR, get the ID for the failed pipeline in the pipeline repositories. Then, in the
pipeline-definition
MR, retrigger that pipeline with the debug code via the bot with something like@cki-ci-bot, please test [baseline/12345678]
-
Wait until the pipeline hits the
sleep
. In the failing job log, determine the gitlab-runner and spawned EC2 machine from lines likeRunning with ... on ...-aws-internal-a-... ... Running on ... via runner-abcdef-arr-cki.prod.general.1234567-123abcdef...
-
With credentials from the deployment repository, log into the spawned EC2 machine, add the SSH key of the developer and determine the IP address via
cki_secret AWS_INTERNAL_RUNNER_SSH_PRIVATE_KEY | ssh-add - cki_variable AWS_INTERNAL_RUNNER_SSH_PUBLIC_KEY > /tmp/internal-runner.pubkey ssh -i /tmp/internal-runner.pubkey root@aws-internal-a-gitlab-runner-host-name sudo docker-machine ssh runner-abcdef-arr-cki.prod.general.1234567-123abcdef curl https://gitlab.com/developer-account.keys | sudo tee -a /root/.ssh/authorized_keys hostname -I
-
The docker container can then be entered via
ssh -i developer-account.key root@ip-address-from-above sudo docker ps sudo docker exec -it container-id /bin/bash