Debugging DataWarehouse triager and issue regexes
Problem
A test was not tagged correctly or an issue regex is not working as expected.
Steps
-
Install and configure
cki-tools
viagit clone https://gitlab.com/cki-project/cki-tools cd cki-tools python3 -m pip install -e --user .[triager] export DATAWAREHOUSE_URL='https://datawarehouse.cki-project.org' export CKI_LOGGING_LEVEL=DEBUG
Get a DataWarehouse token if you need to access internal tests or issues in the DataWarehouse, and export it via
export DATAWAREHOUSE_TOKEN_TRIAGER='token or empty'
-
Check whether a local run of the triager would correctly tag the test via
$ python3 -m cki.triager.main --test-id "redhat:394452540_s390x_upt_9" 2021-10-25T11:10:26.471 - [INFO] - cki.triager.checkers - running: check_logs_with_regex 2021-10-25T11:10:41.105 - [INFO] - cki.triager.checkers - result: [] 2021-10-25T11:10:41.106 - [INFO] - cki.triager.checkers - running: check_kickstart_error 2021-10-25T11:10:41.106 - [INFO] - cki.triager.checkers - result: [] 2021-10-25T11:10:41.106 - [INFO] - cki.triager.checkers - overall result: []
-
If that is the case, the problem is most likely related to the communication between the triager and DataWarehouse.
Check the execution of the triager As-A-Service and search for outstanding problems. In case a test was not triaged, it’s recommended to first be sure it was processed by the service by checking searching the test id through the logs.
All logs from the DataWarehouse Triager are accessible using Applecrumble. It’s possible to use the Explore feature to search for the logs by generating a LogQL query like the following.
{deployment="datawarehouse-triager"}
Make sure you are logged in to access the Explore page and to select the
Loki
data source.It’s possible to narrow down the results by filtering the query with details about the thing you are looking for, such as the test ID or issue name.
{deployment="datawarehouse-triager"} |= "redhat:1234" {deployment="datawarehouse-triager"} |= "Storage blktests"
Make sure to select a time span on the top right corner that would match the moment the test should have been processed.
When an issue is identified, the log output should look similar to these lines:
2021-10-22T21:32:22.132 - [INFO] - cki.triager.checkers - running: check_logs_with_regex 2021-10-22T21:32:24.060 - [INFO] - cki.triager.checkers - result: [{'name': 'Storage blktests - srp: stuck on srp/005', 'id': 691}] 2021-10-22T21:32:24.060 - [INFO] - cki.triager.checkers - running: check_kickstart_error 2021-10-22T21:32:24.061 - [INFO] - cki.triager.checkers - result: [] 2021-10-22T21:32:24.061 - [INFO] - cki.triager.checkers - overall result: [{'name': 'Storage blktests - srp: stuck on srp/005', 'id': 691}] 2021-10-22T21:32:24.128 - [INFO] - cki.triager.triager - Linking issue id={'name': 'Storage blktests - srp: stuck on srp/005', 'id': 691} to id=redhat:133661518
Given that multiple pods run at the same time, these lines will probably be scrambled between other runs.
-
In case the test was processed but the failure was not detected, check whether the regex is correctly detecting the issue.
The following Python script helps you validate the submitted regex against the file where the failure should be present. If your regex pattern requires flags like
re.DOTALL
, set them inline, as a prefix, e.g.(?s)single.*line
.import requests import re LOG_URL = 'https://url-to-log-file' REGEX = r'regex content' log_content = requests.get(LOG_URL).content.decode(errors='ignore') regex = re.compile(REGEX) print(regex.search(log_content))
-
If the regex is correctly defined and the snippet can find it correctly, the last step is to debug the triager execution.
Additional steps for manually tagging issues
-
To enable tagging issues for local runs of the triager, add
--to-dw
similar topython3 -m cki.triager.main --test-id redhat:394452540_s390x_upt_9 --to-dw