cki_tools.pipeline_herder
Configuration via environment variables
Name | Type | Secret | Required | Description |
---|---|---|---|---|
PIPELINE_HERDER_CONFIG |
yaml | no | no | Configuration in YAML. If not present, falls back to PIPELINE_HERDER_CONFIG_PATH . |
PIPELINE_HERDER_CONFIG_PATH |
path | no | no | Path to the configuration YAML file |
GITLAB_TOKENS |
json | no | yes | URL/environment variable pairs of GitLab instances and private tokens as a JSON object |
GITLAB_TOKEN |
string | yes | yes | GitLab private tokens as configured in gitlab_tokens above |
CHATBOT_URL |
url | no | no | chat bot endpoint |
CKI_LOGGING_LEVEL |
enum | no | no | Python logging level for CKI modules, defaults to WARN |
CKI_METRICS_ENABLED |
bool | no | no | Enable prometheus metrics. Default: false |
CKI_METRICS_PORT |
int | no | no | Port where prometheus metrics are exposed. Default: 8000 |
SENTRY_DSN |
url | yes | no | Sentry DSN |
Configuration file
Job matching is configured in the shipped configuration file.
matchers:
- name: image-pull
description: Failure during image pull
messages:
- 'ERROR: Job failed: image pull failed'
- 'ERROR: Job failed: failed to pull image'
- name: integrity
description: Job failed with data_integrity_failure
failure_reason: data_integrity_failure
- action: report
matchers:
- name: no-trace
description: Job has no trace
builtin: no_trace
- name: tests-not-run
description: Test job has tests that did not run
job_name: test
job_status: []
builtin: missed_tests
- name: no-trace
description: Job has no trace
builtin: no_trace
Field | Type | Default | Description |
---|---|---|---|
name |
string | empty | matcher name |
description |
string | empty | matcher description |
action |
string | retry |
retry , report or alert |
maximum_artifact_size |
int | 1_000_000 |
maximum artifact size to process |
retry_delays |
list[string] | [5m] |
delay between successive retries |
retry_limit |
int | 3 |
maximum number of retries, 0 to disable |
web_url |
list[url] | empty | job URL prefixes |
job_status |
list[string] | [failed] |
success , failed |
job_name |
string | empty | job name prefix |
variables |
dict[str,list[regex or None]] | empty | allowed trigger variable values |
exemplars |
list[url] | empty | job URLs that should be matched by this node |
failure_reason |
string | empty | data_integrity_failure , stuck_or_timeout_failure , … |
builtin |
string | empty | no_trace , missed_tests |
messages |
list[str or /regex/] | [] |
pattern to look for in config files |
file_name |
string | empty | log file name, uses console log if empty |
tail_lines |
int | 300 |
number of lines to check |
matchers |
object | empty | further sub-matchers |
In general, matchers are recursively processed depth-first via the matchers
field, with field values getting overwritten if redefined; if the matchers
field is not set, the actual matching takes place with all collected fields.
For message matching via regular expressions, regex modifiers/flags cannot be
appended to the trailing slash. They have to be provided inline
via (?aiLmsux)
.
RabbitMQ setup
The herder will delay the restart of jobs via RabbitMQ dead-letter queues. This needs to be setup as described in the resilient message queue documentation.
Checking a single job
It is possible to run all matchers against a single job to see whether anything matches by specifying the job URL via
python3 -m cki_tools.pipeline_herder.main \
--job-url https://instance/project/-/jobs/012345
Validating the configuration
All embedded job exemplars
can be checked via
python3 -m cki_tools.pipeline_herder.main --validate
Prometheus Metrics
If CKI_METRICS_ENABLED
is true
, Prometheus metrics are exposed on the
CKI_METRICS_PORT
port.
The exposed data is the following:
Name | Type | Labels | Description |
---|---|---|---|
cki_message_delayed |
Counter | no | Number of queued messages delayed via retry queue |
cki_herder_problem_detected |
Counter | gitlab_stage, gitlab_job, matcher | Number of jobs processed where a problem was found |
cki_herder_problem_retries |
Histogram | gitlab_stage, gitlab_job, matcher | Number of retries for a job with a problem |
cki_herder_no_problem_detected |
Counter | gitlab_stage, gitlab_job | Number of jobs processed where no problem was found |
cki_herder_problem_reported |
Counter | gitlab_stage, gitlab_job, matcher | Number of jobs reported (and not retried) after finding a problem |
cki_herder_problem_retried |
Counter | gitlab_stage, gitlab_job, matcher | Number of jobs retried after finding a problem |
cki_herder_process_time_seconds |
Histogram | no | Time spent matching a job |