Advancing CI system cooperation with KernelCI

In 2019 CKI Project held a hackfest inviting all maintainers of CI systems for the Linux Kernel for discussion and cooperation.

The hackfest went well, and among other things we agreed that parsing all the different report e-mails, navigating the various dashboards, and correlating results manually, takes too much of developer and maintainer time, and lowers effectiveness of CI and testing. Building and maintaining all these systems separately is also a lot of work, which we can minimize if we join our forces and produce something everyone could use instead.

So we started an effort to design a schema and a protocol unifying CI result reporting. To back it up, we started building a system to aggregate the various testing results, and to provide unified email notifications and a dashboard. We threw together some PoC code right there and then, named the system and the schema “KCIDB” (for “Kernel CI database”), and the whole effort was given home within the Linux Foundation’s Kernel CI project. Since then I have been a technical representative of CKI Project at KernelCI, and the main driver of KCIDB effort.

KCIDB’s guiding principles are:

Fire and forget - a reporting CI system should be able to participate with minimal involvement:
- get credentials,
- generate report data,
- feed it to the endpoint in a single transaction, regardless of the amount or type of data (e.g. just a revision and a build, or a few tests on somebody else’s build, or a bunch of revisions and thousands of tests),
- walk away - no need to keep a daemon running.
Minimum upfront costs - a reporting CI system is only required to provide the minimal structural data, and most of everything else is optional. This is especially important for acquiring initial participants. At the same time, reporting more and correct data is incentivized through improved reports, statistics and correlation. As we understand more about what data we need, and as our needs to contain system complexity grow, we will start requiring more data, but only as necessary.
Accumulate, not modify - the report database is only added to, never modified (except for maintenance), similarly to a log. This greatly simplifies design, implementation, and understanding of the system.
Extensibility - the schema provides a way to add arbitrary data with every report object, as a way for CI systems to report extra information they deem important, and as a foothold to promote extensions to the schema formalizing it.
Testing never finishes - the system never considers testing as finished for any particular revision or build. However, the system is able to produce a live summary of results received so far.
Developers are in control - developers decide what results they consider worthy of paying attention to, express that in “subscriptions”, and receive notifications only when subscription conditions are met.

The first two CI systems to start submitting their results were, naturally, the CKI Project and KernelCI, but we were soon joined by ARM, Google’s Syzbot, Linaro’s Tuxsuite, and Gentoo’s GKernelCI, with more systems considering participation. You can follow our engagement progress on the KernelCI maillist. Nowadays we’re getting up to 100,000 report objects (revisions/builds/tests) per day from all the CI systems combined.

Currently-deployed KCIDB schema is at version 3 (introduced last year), we host our database in Google BigQuery, ingestion and notification services are running in Google Cloud as well, we have a Grafana Dashboard, and are sending PoC notification emails to a development maillist.

We’re still working on making the subscription and notification system good enough for maintainers and developers, but expect to start reaching out to them in Autumn, after the next release of KCIDB schema and system is done and deployed.

At the same time, we’ve been transitioning communication within the CKI Project to KCIDB schema. When a GitLab CI job finishes in a CKI pipeline, it exposes the resulting checkout/build/test data as its artifacts, in KCIDB format. That data also includes whatever extra information the CKI stack needs, thanks to the schema’s extensibility.

From there, the reports are pulled into our Data Warehouse portal and are made available for developers and test maintainers, and then, with the extra data stripped, the reports are forwarded to the upstream KCIDB database.

We’re working on setting up a public instance of the Data Warehouse (available later this year), so both upstream and Red Hat’s kernel contributors could access our testing results conveniently, with the extra data not yet in KCIDB schema, and with the extra features, such as issue tracking and waiving.