Issues

DataWarehouse provides a way to tag failures on Checkouts, Builds and Tests.

Glossary

Issue

A certain anomaly is represented on the DataWarehouse with an Issue entry. Each Issue represents a problem that can be found across different pipelines.

Example: ‘Timeout while doing X task’

Issue Kind

Issues are grouped on Issue Kinds. Each kind represents a different category of failures.

Example: ‘Infrastructure Issue’, ‘Kernel Bug’

Issue Occurrence

The linking relationship between Issues and KCIDB Objects are called Issue Occurrence

Issue Design

Issue Analysis

DataWarehouse can be used to view/track issues, in addition to viewing recent testing trends per kernel build.

Issues List

View Known Issues

Known issues are tracked in the datawarehouse. These are grouped into categories such as Kernel Bug, Unstable Test, Infrastructure Problem, etc. Issues can be tagged as resolved, which represents that an issue is no longer present. In addition to this, Kernel Bug issues have an Origin Tree property which points to the first tree where the failure was found.

On the Issues List page (On the left sidebar: Issues → Issues List) it’s possible to find the list of failures, grouped by resolution status, and the checkouts against which they are tagged against.

Edit Known Issues

Using the Issues List page it’s possible to modify the Issues.

  • To update Issue resolution: Select Actions → Mark as Resolved/Unresolved

  • To update Issue kind, url, description or origin: Select Actions → Edit Issue

Tag a new Issue

Each checkout which includes failures should be triaged and tagged with the appropriate Issue. To tag a new Issue, please follow these steps:

  1. Link a new failure by adding an Issue Occurrence.
    1. View the Failed Checkouts by selecting Checkouts → Failures on the sidebar. You can filter by Checkout, Build or Test failures by selecting the tabs on top of the list.
    2. View/open failed builds or tests to debug the issue.
    3. Select Actions → Associate Issue.
    4. Select the Issue linked to the failure. The textbox can be used to filter the dropdown options.
    5. If there is no Issue registered for the failure you found, create a new one by clicking on ‘Create new issue’.
    6. Select the affected checkout, build or test and select ‘Save’ near the bottom.
  2. Add a regex pattern to have our bot triager tag the known issue for new checkouts.
    1. Open the Issues Regexes page by selecting Issues → Issues Regexes on the sidebar.
    2. Select Actions → New Issue Regex
    3. Select the Issue you just filed from the drop down menu
    4. Fill out related details for the regex pattern including Text Match (regex pattern), Text Name Match, and File Name Match

Creating an Issue Regex

Keep in mind that each Issue Regex is checked against every log, for every tests. It is important to consider the performance implications.

Here are some best practices you should follow before adding new regexes:

Avoid Wildcards

Please keep the .* wildcards at minimum, as these are expensive to compute. Always use specific search patters where possible.

[  804.236007] nvme nvme0: queue 114: timeout request 0xf type 4

Instead of adding wildcards for every piece of the string that can vary:

nvme nvme.*queue.*timeout request.*type

Use specific search patterns for each one. In this case, we want to ignore the device id (decimal value), queue id (decimal value) and request id (hexadecimal value). Using \d+ for numbers and [0-9a-f]+ for hexadecimals, the search time can be reduced to to 1/100th of the original regex.

nvme nvme\d+: queue \d+: timeout request 0x[0-9a-f]+ type

If you need to test the regex before submitting it, regex101 is a good place to do it. Just paste a block of the log file containing the line you are trying to find, and improve the regex until you are happy with it.

Multiline checks

It it possible to match multiple lines using the new line token (\n).

If you would like to match both of the following lines:

[  804.236007] nvme nvme0: queue 114: timeout request 0xf type 4
[  804.241756] nvme nvme0: queue 114: timeout request 0x10 type 4

A good regex could be:

\[\s+\d+\.\d+\] nvme nvme\d+: queue \d+: timeout request 0x[0-9a-f]+ type \d\n\[\s+\d+\.\d+\] nvme nvme\d+: queue \d+: timeout request 0x[0-9a-f]+ type \d
  • \[\s+\d+\.\d+\] matches the timestamp

  • \d+ matches one or more integers

  • [0-9a-f]+ matches the hexadecimal value

  • \n matches the new line

If you need a more detailed explaination about what each piece of the regex is doing, regex101 provides a great analysis.

Testing trends are available in DataWarehouse to view test results per kernel build, machines they are run against, and test confidence rating (test stability).

Tests Confidence

Confidence graphs include a summary of the test results overall (Pass|Fail|Error|Skip), this helps us determine how often a test is failing in the pipeline and if any follow up action is needed. The Tests Confidence page can be found under Confidence → Tests on the sidebar.

Test Results

You can view test results for all kernels and machines they were run against by selecting the test from the Test Confidence page to drill down further.

You have the option to filter the results by status at the top of the page, this helps determine if a test has failed a previous kernel build or it’s related to the kernel patch/build under test.