Planning and incident handling

How the CKI team is using GitLab epics and issues for planning

General idea

GitLab epics and issues are used to handle planned, unplanned and incident-related work in a Kanban-like process.

For incidents, issues do not only cover the immediate incident response, but also further work such as documentation improvements or root cause remediation to prevent similar incidents.

Consistent application of the labels is enforced by the sprinter webhook.

Teams

Plannable work items should be tagged with CWF::Type::Team. Any issue created in any of the projects in the cki-project group on GitLab will be automatically assigned a type if not done so during creation:

  • CWF::Type::ARK for kernel-ark, handled by the ARK team
  • CWF::Type::KWF for kernel-workflow, handled by the Kernel Workflow team
  • CWF::Type::CKI for all the rest

For epics, this needs to be done manually.

To change the responsible team, change the CWF::Type label as appropriate.

Structure

Topical grouping

High-level topics are tracked as Epics tagged with CWF::Type::User Story. These topics can be open-ended, and should consist of plannable epics.

High-level planning

Planning is based on Epics with CWF::Type::Team labels as described above. Epics represent a single deliverable feature. They contain issues tracking the actual individual work items.

Considerations:

  • At least two developers should work in parallel on one epic.
  • They should be sized so that it is possible to finish them within a month, depending on the seniority/velocity of the developers.

Low-level planning

Individual planned work items are tracked as Issues with CWF::Type::Team labels. Issues represent the smallest deliverable change.

Considerations:

  • Issues create visibility, which in turn enables discussion and collaboration.
  • Issues allow to connect all the individual pieces required for a certain work item.

Incidents

Incidents, i.e. service outages or degradations, are tracked as Issue with CWF::Type::Incident labels.

Life cycle of epics and issues

Epics and issues go through the following phases visible on the respective epic and issue boards:

  • Open: The work item has been created and is sitting peacefully in the backlog.
  • CWF::Stage::On Deck: The team has agreed on working on the work item in the next planning cycle. For epics, this means that the work is considered critical for stake holders. For individual issues, they should be fleshed out enough to be implementable and have clear acceptance criteria to limit scope creep. If necessary, bigger issues should be split into the smallest deliverable changes.
  • CWF::Stage::In Progress: The work item is actively worked on.
  • Closed: The work item is completed and all acceptance criteria are fulfilled. If some acceptance criteria are remaining, but no further work is planned for now, the outstanding acceptance criteria can also be moved into a new work item. Any remaining CWF::Stage::* labels are removed.

Life cycle of incident issues

Incidents go through the following phases visible on the incident issue board:

  • CWF::Incident::Active: The incident is significantly affecting the production environment.
  • CWF::Incident::Mitigated: The incident is affecting the production environment in a limited way.
  • CWF::Incident::Resolved: The incident is no longer affecting the production environment. Some work remains to be done, e.g. further monitoring, documentation or root cause remediation.
  • Closed: The issue is closed when all outstanding work items are completed. Any remaining CWF::Incident::* labels are removed.

Planning

Planning cycle

Work is planned on a quarterly basis. This tries to satisfy the following constraints:

  • balance between feature development and maintenance work
  • long enough cycle to allow to deliver multiple features
  • short enough cycle to be able to deliver features within a reasonable time frame to stake holders

Quarterly planning

In the following, planning means figuring out what to work on during the next quarter.

Most likely, this also involves a cleanup of the currently existing epics and stand-alone issues, and the adding of new plannable epics.

Various epic/issue boards are available to help with this:

  • The epic types board can be used to inspect and sort the currently available epics:

    1. Epics should be sorted into the correct column, with only the CWF::Type::CKI column being interesting for the acute planning.
    2. Epics in the CWF::Type::CKI column should be sorted with decreasing importance.
  • The quarterly planning epic board can be used to further refine the “candidate epics”:

    1. For the next quarter, user-requested epics should be prioritized as much as possible.
    2. Move epics from previous quarters that are still relevant forward, or drop them in the open column if no further work is planned for now.
  • The current quarter epic board can be used to do fine-grained planning for the current quarter. Make sure that the board filter is correctly configured for the current quarter.

    1. Make sure the CWF::Stage::In Progress column accurately reflects the currently worked-on work items. There should never be more than 3 to 4 epics in this column.
    2. The CWF::Stage::On Deck column should contain the next epics to work on after that.
    3. The Open column should not contain anything that would be just “nice to have”.
  • The CKI Kanban issue board with a Epic = None filter can be used to inspect standalone issues. While it is ok to have such issues in the CWF::Stage::On Deck and CWF::Stage::In Progress colums, (and the board can be used to track them across their life cycle), consider attaching them to an epic for the current quarter to make it easier to keep track of them.

  • The incident board shows tickets related to incidents.

    1. Aggressively purge the CWF::Incident::Active and CWF::Incident::Mitigated columns. It doesn’t make any sense to have an incident sitting in these columns for a longer time. If that is the case, most likely the ticket is just referring to a “normal” bug, and should be relabeled as CWF::Type::CKI.

    2. To a lesser extend, the CWF::Incident::Resolved column should be purged as well, e.g. by attaching tickets to a planned epic and converting them to normal issues. This ensures that the incident board shows meaningful actionable information.

After the planning is done, add the In Progress and On Deck epics to the Canvas of the #team-kernel-cki Slack channel.

Handling incidents

Next to the short-term components such as the immediate mitigation and resolution of an incident itself, an incident response also has to contain strategic improvements to prevent recurrence.

When handling incidents, focus on the following aspects depending on the incident state:

  1. CWF::Incident::Active: reduce the impact on the production environment
  2. CWF::Incident::Mitigated: resolve the direct cause of the incident
  3. CWF::Incident::Resolved: improve on the root cause of the incident, e.g. by improving
    • monitoring/alerting of the conditions that led to the incident
    • monitoring/alerting for a similar incident
    • logging to aid in faster detection/recovery for a similar incident
    • documentation
    • the underlying code and/or architecture

Working with GitLab issues

Creating a new issue

Create a new GitLab issue, e.g.

  • on the top bar on a project page, select the plus sign (+) and then, under This project, select New issue from the top bar ("+")
  • on the left sidebar on a project page, select Issues and then, in the upper-right corner, select New Issue
  • on a project page, press the i shortcut
  • on any of the GitLab issue boards, select the appropriate list menu () and then Create new issue

Transitioning an issue between the different stages

To transition an issue to a different stage (<STAGE>), e.g.

  • on the right sidebar on an issue page, select Edit next to Labels, and then select the appropriate CWF::Stage::<STAGE> label
  • in the comment box on an issue page (e shortcut), write /label ~"CWF::Stage::<STAGE>" and submit the comment
  • on any of the GitLab issue boards, drag the issue card to the appropriate list

Closing an issue

Close the GitLab issue, e.g.

  • at the top of an issue page, select Close issue
  • in the comment box on an issue page (e shortcut), write /close and submit the comment
  • on any of the GitLab issue boards, drag the issue card to the Closed list

Working with incident issues

Creating a new incident issue

Create a new GitLab issue, e.g.

  • on the top bar on a project page, select the plus sign (+) and then, under This project, select New issue from the top bar ("+")
  • on the left sidebar on a project page, select Issues and then, in the upper-right corner, select New Issue
  • on a project page, press the i shortcut
  • on the incident issue board, select the appropriate list menu () and then Create new issue

Unless the issue was created through the incident issue board, make sure to tag the issue with at least CWF::Type::Incident.

Converting between an incident issue and a normal issue

To convert a normal issue into an incident issue, add the CWF::Type::Incident label to it, e.g.

  • on the right sidebar on an issue page, select Edit next to Labels, and then select the CWF::Type::Incident label
  • in the comment box on an issue page (e shortcut), write /label ~"CWF::Type::Incident" and submit the comment

To convert an incident issue into a normal issue, remove the CWF::Type::Incident label from it, e.g.

  • on the right sidebar on an issue page, select Edit next to Labels, and then deselect the CWF::Type::Incident label
  • in the comment box on an issue page (e shortcut), write /unlabel ~"CWF::Type::Incident" and submit the comment

Transitioning an incident issue between the different phases

To transition an incident issue to a different phase (<PHASE>), e.g.

  • on the right sidebar on an issue page, select Edit next to Labels, and then select the appropriate CWF::Incident::<PHASE> label
  • in the comment box on an issue page (e shortcut), write /label ~"CWF::Incident::<PHASE>" and submit the comment
  • on the incident issue board, drag the issue card to the appropriate list

Closing an incident issue

Close the GitLab issue, e.g.

  • at the top of an issue page, select Close issue
  • in the comment box on an issue page (e shortcut), write /close and submit the comment
  • on the incident issue board, drag the issue card to the Closed list