Adding an expression to the pipeline herder

How to get the pipeline-herder to retry GitLab jobs with certain characteristics

Problem

A GitLab pipeline job failed, and it can be fixed by retrying. This can (and should!) be done automatically by the pipeline-herder.

Steps

  1. Determine the job output that is typical for a failed job.

    As an example, job 2612657346 contains the following output:

    Downloading artifacts 00:01
    Downloading artifacts for prepare builder (2490531250)...
    ERROR: Downloading artifacts from coordinator... forbidden  id=2490531250 responseStatus=403 Forbidden status=GET https://gitlab.com/api/v4/jobs/2490531250/artifacts: 403 Forbidden token=MUVtKQWe
    FATAL: permission denied
    Uploading artifacts for failed job 00:01
    Uploading artifacts...
    
  2. Check the raw logs for the relevant line.

    For the example above, the output is

    ERROR: Downloading artifacts from coordinator... forbidden  id=2490531250 responseStatus=403 Forbidden status=GET https://gitlab.com/api/v4/jobs/2490531250/artifacts: 403 Forbidden token=MUVtKQWe
    
  3. Clone the pipeline-herder repository as described in the getting started documentation. This will clone the repository, create a fork if necessary, setup direnv for Python development and install all necessary packages.

    If the repository was cloned already, manually setup a direnv Python environment and install the dependencies via

    echo 'layout python3' > .envrc
    echo "export GITLAB_TOKENS='{\"gitlab.com\":\"COM_GITLAB_TOKEN_PERSONAL\"}'" >> .envrc
    echo 'export COM_GITLAB_TOKEN_PERSONAL="your-secret-token-from-gitlab-com"' >> .envrc
    direnv allow
    pip install -e .
    
  4. Create a matcher in the pipeline-herder repository in pipeline_herder/matchers.py.

    For the example above, such a matcher could look like this:

    utils.Matcher(
        name='artifacts-error',
        description='Problem while downloading artifacts',
        messages=re.compile(r'ERROR: Downloading artifacts from coordinator.*forbidden.*403 Forbidden'),
    )
    
  5. Check that the matcher works by running the pipeline-herder locally on the job URL via something like

     $ python3 -m pipeline_herder.main --job-url https://gitlab.com/.../-/jobs/2612657346
     Problem while downloading artifacts
    
  6. Save the tail of the logs with the relevant lines in tests/asserts/traces/.

  7. Add a test case to tests/test_matchers_traces.py, e.g. something like

    def test_artifacts_error_forbidden(self):
        """Test artifacts-error with 403 forbidden error."""
        self._test('artifacts-error', 'artifact_error_forbidden.txt')
    
  8. File a merge request to get the new matcher deployed.

Last modified June 22, 2022: Document updating the herder (90c0672)