Types of Failures
Kernel Bug
Build Failures
If a kernel fails to build in the pipeline, this applies to commits which are already pushed to git.
Bug in Patch Series
A kernel patch can fail for the following reasons:
- The patch does not apply cleanly
- The kernel fails to build
- The patch introduces a kernel bug
Test Failures
A regression is found in the kernel source when testing a kernel patch or build.
Unstable Tests
Tests can become unstable for various reasons, here are some possible reasons:
- New kernel features which are not yet supported by the test
- A kernel feature or bug fix which changes the behavior expected by the test
- A test bug related to a recent update
Infrastrucuture
Test Timeout
This can happen if a test runs longer than the expected timeout or a machine loses connection with the lab controller. You may see either a LWD (Local Watchdog) or EWD (External Watchdog) triggered which will abort the test. In many cases this can be caused by running on a slow machine or VM which has overcommit. The test logs should still be inspected in case there is an underlying kernel bug which is causing the timeout.
Internal CKI GitLab Outage
We host a private instance of Gitlab which is used for pipeline sources and triggering, if an outage occurs it will cause delays in testing. Once services are restored, the backlog of patches and git changes will automatically be tested. For brew builds, we need to manually restart the jobs since pipelines are triggered as soon as the build is created.
Internal Network Outage
Internal network outage related to any service in which CKI is dependent on.
GitLab.com Outage
We use Gitlab both internally and externally to host our development repositories used in the pipeline. If an outage occurs it could lead to pipeline failures.
GitHub.com Outage
Github.com is used to host our public test repository. Periodically Github.com will be offline which leads to the Restraint test harness unable to extract the repository, leading to test aborts.
Beaker Performance Issues
Beaker is used to provision machines and schedule jobs, it is used in conjunction with Restraint Test Harness. Periodically we may experience Beaker performance issues, which can lead to extended test durations and watchdog timeouts. You may see either a Local Watchdog (LWD) or an External Watchdog (EWD) event in the logs.
Restraint Bug
Restraint is the test harness which is used in beaker, occasionally we experience bugs which could lead to test failures.
Gitlab Runner Bug
A Gitlab runner bug can result in a pipeline failiure, jobs are manually retried in this case.
Workflow
Development Workflow Quirk
Caused by an error in the development workflow when submitting a patch.
RT versus RHEL kernel differences
Realtime kernel tree is separate from RHEL kernel, but closely follows its development. The differences between trees can cause RHEL patches to fail to apply or build. This is not a bug in the kernel patch, merely the result of realtime functionality not being fully integrated yet.
Overwritten Git Repository
Tested commit was overwritten by a force push when the testing was starting. This caused the commit to disappear which means we couldn’t test it.