Planning and incident handling

How the CKI team is using GitLab epics and issues for planning

General idea

GitLab epics and issues are used to handle planned, unplanned and incident-related work in a multi-level Agile process.

With a planning horizon of one quarter, epics are used to track strategic long-term work items planned for the next quarter and beyond.

With a planning horizon of two weeks, issues are used to track short-term work items planned for the next Scrum sprint.

Long-term quarterly planning

Long-term work is planned on a quarterly basis. This tries to satisfy the following constraints:

  • long enough cycle to allow to deliver complete features, and to work on strategic initiatives
  • short enough cycle to be able to deliver features quickly to stake holders

Using epics for planning

Long-term planning is based on epics.

Epics can be either open-ended or plannable.

Open-ended epics are used to track high-level topics that are not yet ready for planning. These epics can either contain individual issues or further epics.

To be able to plan an Epic, its scope needs to be refined so that the epic represents a single deliverable feature. After refinement, the epic should contain the minimum set of issues tracking the actual individual work items.

Considerations:

  • At least two developers should work in parallel on one epic.
  • They should be sized so that it is possible to finish them within a month, depending on the seniority/velocity of the developers.

Long-term planning process

  1. Board configuration: Inspect the currently existing columns on the quarterly planning epic board. Make sure that the columns are correctly configured for the current and next quarter based on the appropriate CWF::Planning::* labels. If necessary, create new labels in the cki-project GitLab group.

  2. Cleanup: Inspect the currently existing epics in the left-most column of the quarterly planning epic board. Make sure that all epics in that column are either open-ended, or on the way to be plannable in the future. Close epics that are abandoned, or that have no realistic chance to be implemented in the medium-term future.

  3. Backlog sorting: Sort the epics in the left-most column of the quarterly planning epic board by priority. User-requested epics should be prioritized as much as possible and should be at the top of the list.

  4. Selecting work for the current quarter: At the beginning of a new quarter, use the quarterly planning epic board to move epics from previous quarters that are still relevant forward, or drop them in the open column if no further work is planned for now.

    During the quarterly review and planning sessions, agree with the team on the epics to work on during the next quarter. Move these epics into the column corresponding to the current quarter on the quarterly planning epic board. The selected epics must have a clear scope and definition of done.

    After the planning is done, add the selected epics to the Canvas of the #team-kernel-cki Slack channel.

  5. Tracking work for the current quarter: The current quarter epic board allows to do fine-grained planning for the current quarter. Make sure that the board filter is correctly configured for the current quarter.

    Epics selected for the current quarter go through the following phases:

    1. Open: The epic is planned for the current quarter, but not yet ready for implementation. This might be because other work needs to happen first, or because the work covered by the epic is still blocked by external factors.

    2. CWF::Stage::Refined: The epic is ready to be worked on.

    3. CWF::Stage::In Progress: The epic is actively worked on. Make sure that the number of epics in this column stays realistic. There should never be more than 3 to 4 epics in this column.

    4. Closed: The work covered by the epic is completed and all acceptance criteria are fulfilled. If some acceptance criteria are remaining, but no further work is planned for now, the outstanding acceptance criteria should be moved into a new epic in the backlog. Any remaining CWF::Stage::* labels are removed.

    Continuously check that all columns never contain anything that would be just “nice to have”. There is no fault in dropping epics that are no longer relevant or no longer aligned with current priorities.

Short-term sprint-based planning

Short-term work is planned on a two-week basis following a Scrum-aligned process. This tries to satisfy the following constraints:

  • balance between feature development and maintenance work
  • long enough cycle to allow to complete individual tickets within the sprint
  • short enough cycle to be able to deliver features tracked by epics within a reasonable time by chipping away at the individual tickets one-at-a-time

Using issues for planning

Short-term planning is based on issues.

To be able to add an issue to a sprint, its scope needs to be refined so that the issue represents a single deliverable change.

Considerations:

  • The issue should be implementable by a single developer within one sprint.
  • The description should contain enough information for somebody other than the original author to be able to implement the change. This includes the problem statement and a suggested solution.
  • To limit scope creep, the issue should contain a list of acceptance criteria.

Short-term planning Scrum process

  1. Board configuration: Inspect the currently existing columns on the sprint planning review and sprint planning issue boards. Make sure that the columns are correctly configured for the current and next sprint.

  2. Reviewing completed work for the current sprint: During the review meeting at the end of a sprint, review the work completed during the sprint on the sprint review board.

    If necessary, adjust the status of issues by moving them to the appropriate column. Closing recurring issues tagged with CWF::Type::Recurring will prevent them from being recreated for the next sprint.

  3. Pruning outstanding work from the current sprint: During the review meeting at the end of a sprint, check that the outstanding planned work listed in the column for the current sprint on the sprint planning review board is still relevant. Issues that are no longer relevant should be moved to the open column. Issues that are still relevant and should be worked on in the next sprint should not be touched and will automatically be rolled over to the next sprint once the current sprint closes.

    Open recurring issues tagged with CWF::Type::Recurring will automatically be closed and duplicated for the next sprint at the end of the current sprint.

  4. Selecting work for the next sprint: Before and during the planning meeting at the beginning of a sprint, use the sprint planning board to move issues from the open column into the column for the next sprint.

    The sprint planning board only shows a subset of open issues based on the CWF::Planning::Current label. These issues are either

    • incident issues based on the CWF::Type::Incident label
    • part of actively worked-on epics of the current quarter based on the CWF::Stage::Refined or CWF::Stage::In Progress labels

    Avoid to add other issues to the sprint planning.

    Open recurring issues tagged with CWF::Type::Recurring will automatically be recreated for the next sprint and do not need to be manually added to the sprint planning.

  5. Tracking work for the current sprint: The current sprint issue board allows to track the progress of planned work for the current sprint.

    Issues planned for the current sprint go through the following phases:

    1. Open: The issue is planned for the current sprint, but is not yet ready for implementation. This might be because other work needs to happen first, or because the work covered by the issue is still blocked by external factors.

    2. CWF::Stage::Refined: The issue is ready to be worked on.

    3. CWF::Stage::In Progress: The issue is actively worked on.

    4. Closed: The issue is completed and all acceptance criteria are fulfilled. If some acceptance criteria are remaining, but no further work is planned for now, the outstanding acceptance criteria can also be moved into a new issue. Any remaining CWF::Stage::* labels are removed.

    Continuously check that all columns never contain anything that would be just “nice to have”. There is no fault in dropping issues that are no longer relevant or no longer aligned with current priorities.

    When clicking through to the current CKI sprint on the GitLab iteration cadences page, the following charts are available:

    • The burn-down chart contains the issue count/weight of the remaining planned work in the current sprint over time. Ideally, the lines should be going down following the Guideline, with two little bumps in the middle corresponding to the two weekends.

      The offset of the line at the end of the sprint corresponds to the amount of planned work not completed during the sprint that would roll over to the next sprint.

    • The burn-up chart contains the issue count/weight of completed planned and unplanned work in the current sprint over time. The difference between the two lines corresponds to the burn-down chart.

Handling incidents

Next to the short-term components such as the immediate mitigation and resolution of an incident itself, an incident response also has to contain strategic improvements to prevent recurrence.

When handling incidents, focus on the following aspects depending on the incident state:

  1. CWF::Incident::Active: reduce the impact on the production environment
  2. CWF::Incident::Mitigated: resolve the direct cause of the incident, and improve on the root cause of the incident, e.g. by improving
    • monitoring/alerting of the conditions that led to the incident
    • monitoring/alerting for a similar incident
    • logging to aid in faster detection/recovery for a similar incident
    • documentation
    • the underlying code and/or architecture

Incidents

Incidents, i.e. significant user-visible service outages or degradations, are tracked as issues with CWF::Type::Incident labels.

For incidents, issues do not only cover the immediate incident response, but also further work such as documentation improvements or root cause remediation to prevent similar incidents.

Incident handling process

  • The incident board shows tickets related to incidents.

    1. Aggressively purge the CWF::Incident::Active and CWF::Incident::Mitigated columns. It doesn’t make any sense to have an incident sitting in these columns for a longer time. If that is the case, most likely the ticket is just referring to a “normal” bug, and should be relabeled as CWF::Type::Bug. Optionally, the ticket can be attached to a planned epic.

Life cycle of incident issues

Incidents go through the following phases visible on the incident issue board:

  • CWF::Incident::Active: The incident is significantly affecting the production environment.
  • CWF::Incident::Mitigated: The incident is not affecting the production environment anymore, or only in a limited way. Some work remains to be done, e.g. further monitoring, documentation or root cause remediation.
  • Closed: The issue is closed when all outstanding work items are completed. Any remaining CWF::Incident::* labels are removed.

Mitigated incidents can also be converted into normal issues if the mitigation is considered “good enough” for now, and the remaining work is not considered critical.

Working with GitLab issues

Creating a new issue

Create a new GitLab issue, e.g.

  • on the top bar on a project page, select the plus sign (+) and then, under This project, select New issue from the top bar ("+")
  • on the left sidebar on a project page, select Issues and then, in the upper-right corner, select New Issue
  • on a project page, press the i shortcut
  • on any of the GitLab issue boards, select the appropriate list menu () and then Create new issue

Transitioning an issue between the different stages

To transition an issue to a different stage (<STAGE>), e.g.

  • on the right sidebar on an issue page, select Edit next to Labels, and then select the appropriate CWF::Stage::<STAGE> label
  • in the comment box on an issue page (e shortcut), write /label ~"CWF::Stage::<STAGE>" and submit the comment
  • on any of the GitLab issue boards, drag the issue card to the appropriate list

Closing an issue

Close the GitLab issue, e.g.

  • at the top of an issue page, select Close issue
  • in the comment box on an issue page (e shortcut), write /close and submit the comment
  • on any of the GitLab issue boards, drag the issue card to the Closed list

Working with incident issues

Creating a new incident issue

Create a new GitLab issue, e.g.

  • on the top bar on a project page, select the plus sign (+) and then, under This project, select New issue from the top bar ("+")
  • on the left sidebar on a project page, select Issues and then, in the upper-right corner, select New Issue
  • on a project page, press the i shortcut
  • on the incident issue board, select the appropriate list menu () and then Create new issue

Unless the issue was created through the incident issue board, make sure to tag the issue with at least CWF::Type::Incident.

Converting between an incident issue and a normal issue

To convert a normal issue into an incident issue, add the CWF::Type::Incident label to it, e.g.

  • on the right sidebar on an issue page, select Edit next to Labels, and then select the CWF::Type::Incident label
  • in the comment box on an issue page (e shortcut), write /label ~"CWF::Type::Incident" and submit the comment

To convert an incident issue into a normal issue, remove the CWF::Type::Incident label from it, e.g.

  • on the right sidebar on an issue page, select Edit next to Labels, and then deselect the CWF::Type::Incident label
  • in the comment box on an issue page (e shortcut), write /unlabel ~"CWF::Type::Incident" and submit the comment

Transitioning an incident issue between the different phases

To transition an incident issue to a different phase (<PHASE>), e.g.

  • on the right sidebar on an issue page, select Edit next to Labels, and then select the appropriate CWF::Incident::<PHASE> label
  • in the comment box on an issue page (e shortcut), write /label ~"CWF::Incident::<PHASE>" and submit the comment
  • on the incident issue board, drag the issue card to the appropriate list

Closing an incident issue

Close the GitLab issue, e.g.

  • at the top of an issue page, select Close issue
  • in the comment box on an issue page (e shortcut), write /close and submit the comment
  • on the incident issue board, drag the issue card to the Closed list

Team tagging

Plannable work items are tagged with CWF::Team. Any issue created in any of the projects in the cki-project group on GitLab will be automatically assigned a team if not done so during creation:

  • CWF::Team::ARK for kernel-ark, handled by the ARK team
  • CWF::Team::KWF for kernel-workflow, handled by the Kernel Workflow team
  • CWF::Team::CKI for all the rest

To change the responsible team, change the CWF::Team label as appropriate.

Implementation

Consistent application of the labels is enforced by the sprinter service.