Debugging DataWarehouse triager and issue regexes

How to investigate why a certain test was not triaged in DataWarehouse

Problem

A test was not tagged correctly or an issue regex is not working as expected.

Steps

Install and configure cki-tools via

git clone https://gitlab.com/cki-project/cki-tools
cd cki-tools
python3 -m pip install -e --user .[triager]
export DATAWAREHOUSE_URL='https://datawarehouse.cki-project.org'
export CKI_LOGGING_LEVEL=DEBUG

If you need to access internal tests or issues in the DataWarehouse, obtain and cache a Kerberos token with kinit username@REALM, and export the literal string “oidc” to initialize an authenticated session:

export DATAWAREHOUSE_TOKEN_TRIAGER='oidc'  # keep empty, if you don't need to authenticate

Check whether a local run of the triager would correctly tag the test via

$ python3 -m cki.triager.main --test-id "redhat:394452540_s390x_upt_9"
2021-10-25T11:10:26.471 - [INFO] - cki.triager.checkers -  running: check_logs_with_regex
2021-10-25T11:10:41.105 - [INFO] - cki.triager.checkers -   result: []
2021-10-25T11:10:41.106 - [INFO] - cki.triager.checkers -  running: check_kickstart_error
2021-10-25T11:10:41.106 - [INFO] - cki.triager.checkers -   result: []
2021-10-25T11:10:41.106 - [INFO] - cki.triager.checkers -  overall result: []

If that is the case, the problem is most likely related to the communication between the triager and DataWarehouse.

Check the execution of the triager As-A-Service and search for outstanding problems. In case a test was not triaged, it’s recommended to first be sure it was processed by the service by checking searching the test id through the logs.

All logs from the DataWarehouse Triager are accessible using Applecrumble. It’s possible to use the Explore feature to search for the logs by generating a LogQL query like the following.
```
{deployment="datawarehouse-triager"}
```
Make sure you are logged in to access the Explore page and to select the Loki data source.

It’s possible to narrow down the results by filtering the query with details about the thing you are looking for, such as the test ID or issue name.
```
{deployment="datawarehouse-triager"} |= "redhat:1234"
{deployment="datawarehouse-triager"} |= "Storage blktests"
```
Make sure to select a time span on the top right corner that would match the moment the test should have been processed.

When an issue is identified, the log output should look similar to these lines:
```
2021-10-22T21:32:22.132 - [INFO] - cki.triager.checkers -  running: check_logs_with_regex
2021-10-22T21:32:24.060 - [INFO] - cki.triager.checkers -   result: [{'name': 'Storage blktests - srp: stuck on srp/005', 'id': 691}]
2021-10-22T21:32:24.060 - [INFO] - cki.triager.checkers -  running: check_kickstart_error
2021-10-22T21:32:24.061 - [INFO] - cki.triager.checkers -   result: []
2021-10-22T21:32:24.061 - [INFO] - cki.triager.checkers -  overall result: [{'name': 'Storage blktests - srp: stuck on srp/005', 'id': 691}]
2021-10-22T21:32:24.128 - [INFO] - cki.triager.triager - Linking issue id={'name': 'Storage blktests - srp: stuck on srp/005', 'id': 691} to id=redhat:133661518
```
Given that multiple pods run at the same time, these lines will probably be scrambled between other runs.
In case the test was processed but the failure was not detected, check whether the regex is correctly detecting the issue.

The following Python script helps you validate the submitted regex against the file where the failure should be present. If your regex pattern requires flags like re.DOTALL, set them inline, as a prefix, e.g. (?s)single.*line.
```
import requests
import re

LOG_URL = 'https://url-to-log-file'
REGEX = r'regex content'

log_content = requests.get(LOG_URL).content.decode(errors='ignore')
regex = re.compile(REGEX)

print(regex.search(log_content))
```
If the regex is correctly defined and the snippet can find it correctly, the last step is to debug the triager execution.

Additional steps for manually tagging issues

To enable tagging issues for local runs of the triager, add --to-dw similar to

python3 -m cki.triager.main --test-id redhat:394452540_s390x_upt_9 --to-dw