Fixing missing Koji/Brew builds

How to investigate a report of a missing Koji/Brew build

Problem

You get a problem of a missing Koji/Brew build, e.g.

Subject: CKI hasn't started for 8.4 Beta build

I built kernel-4.18.0-291.el8 in the am, but (now at 7pm)
there are still no CKI jobs for 8.4 Beta build. Is there an outage?

Steps

  1. Check the AMQP UMB bridge logs in Grafana via {deployment="amqp-bridge-umb"}. If there are any error messages, continue with investigating UMB problems.

  2. Determine the relevant build job for the kernel on Koji/Brew. Click on the Task for the build in green further below and note down the task ID.

  3. Check the pipeline trigger logs in Grafana for the task ID via something like {deployment="pipeline-trigger"} |= "12345". Make sure to understand why the pipeline was not triggered before continuing.

  4. Retrigger the missing pipeline with the following command.

    CKI_DEPLOYMENT_ENVIRONMENT=production \
        GITLAB_PARENT_PROJECT=redhat/red-hat-ci-tools/kernel/cki-internal-pipelines \
        PIPELINE_DEFINITION_URL=https://gitlab.com/cki-project/pipeline-definition \
        python3 -m cki_tools.koji_trigger \
        --gitlab-url https://gitlab.com \
        --config-path ../internal-pipeline-data/brew.yaml \
        --task-id <task-id>
    

Additional steps for scratch builds

  1. The binaries are deleted after a few days, and the Koji/Brew web interface really isn’t friendly to find the tasks. Here’s one scratch build example: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=35043446

    The datagrepper can be used to find those builds, e.g. via https://datagrepper.engineering.redhat.com/raw?topic=/topic/VirtualTopic.eng.brew.task.closed&delta=720000&contains=kernel It may take a few retries to make it load instead of returning 502 🙈.

  2. Look for build task events (as opposed to buildArch) in the correct timeframe. The numbers right next to them are the task IDs required for retriggering.

  3. It is important to actually open the JSONs for the build tasks and verify the NVR is something that should be tested instead of blindly submitting all of them (although the trigger should be smart enough to skip the builds).