Planning and incident handling
General idea
GitLab epics and issues are used to handle planned, unplanned and incident-related work in a Kanban-like process.
For incidents, issues do not only cover the immediate incident response, but also further work such as documentation improvements or root cause remediation to prevent similar incidents.
Consistent application of the labels is enforced by the sprinter webhook.
Teams
Plannable work items should be tagged with CWF::Type::Team
. Any issue created
in any of the projects in the cki-project group on GitLab will be
automatically assigned a type if not done so during creation:
CWF::Type::ARK
for kernel-ark, handled by the ARK teamCWF::Type::KWF
for kernel-workflow, handled by the Kernel Workflow teamCWF::Type::CKI
for all the rest
For epics, this needs to be done manually.
To change the responsible team, change the CWF::Type
label as appropriate.
Structure
Topical grouping
High-level topics are tracked as Epics tagged with CWF::Type::User Story
.
These topics can be open-ended, and should consist of plannable epics.
High-level planning
Planning is based on Epics with CWF::Type::Team
labels as described above.
Epics represent a single deliverable feature. They contain issues tracking
the actual individual work items.
Considerations:
- At least two developers should work in parallel on one epic.
- They should be sized so that it is possible to finish them within a month, depending on the seniority/velocity of the developers.
Low-level planning
Individual planned work items are tracked as Issues with CWF::Type::Team
labels. Issues represent the smallest deliverable change.
Considerations:
- Issues create visibility, which in turn enables discussion and collaboration.
- Issues allow to connect all the individual pieces required for a certain work item.
Incidents
Incidents, i.e. service outages or degradations, are tracked as
Issue with CWF::Type::Incident
labels.
Life cycle of epics and issues
Epics and issues go through the following phases visible on the respective epic and issue boards:
Open
: The work item has been created and is sitting peacefully in the backlog.CWF::Stage::On Deck
: The team has agreed on working on the work item in the next planning cycle. For epics, this means that the work is considered critical for stake holders. For individual issues, they should be fleshed out enough to be implementable and have clear acceptance criteria to limit scope creep. If necessary, bigger issues should be split into the smallest deliverable changes.CWF::Stage::In Progress
: The work item is actively worked on.Closed
: The work item is completed and all acceptance criteria are fulfilled. If some acceptance criteria are remaining, but no further work is planned for now, the outstanding acceptance criteria can also be moved into a new work item. Any remainingCWF::Stage::*
labels are removed.
Life cycle of incident issues
Incidents go through the following phases visible on the incident issue board:
CWF::Incident::Active
: The incident is significantly affecting the production environment.CWF::Incident::Mitigated
: The incident is affecting the production environment in a limited way.CWF::Incident::Resolved
: The incident is no longer affecting the production environment. Some work remains to be done, e.g. further monitoring, documentation or root cause remediation.Closed
: The issue is closed when all outstanding work items are completed. Any remainingCWF::Incident::*
labels are removed.
Planning
Planning cycle
Work is planned on a quarterly basis. This tries to satisfy the following constraints:
- balance between feature development and maintenance work
- long enough cycle to allow to deliver multiple features
- short enough cycle to be able to deliver features within a reasonable time frame to stake holders
Quarterly planning
In the following, planning means figuring out what to work on during the next quarter.
Most likely, this also involves a cleanup of the currently existing epics and stand-alone issues, and the adding of new plannable epics.
Various epic/issue boards are available to help with this:
-
The epic types board can be used to inspect and sort the currently available epics:
- Epics should be sorted into the correct column, with only the
CWF::Type::CKI
column being interesting for the acute planning. - Epics in the
CWF::Type::CKI
column should be sorted with decreasing importance.
- Epics should be sorted into the correct column, with only the
-
The quarterly planning epic board can be used to further refine the “candidate epics”:
- For the next quarter, user-requested epics should be prioritized as much as possible.
- Move epics from previous quarters that are still relevant forward, or drop them in the open column if no further work is planned for now.
-
The current quarter epic board can be used to do fine-grained planning for the current quarter. Make sure that the board filter is correctly configured for the current quarter.
- Make sure the
CWF::Stage::In Progress
column accurately reflects the currently worked-on work items. There should never be more than 3 to 4 epics in this column. - The
CWF::Stage::On Deck
column should contain the next epics to work on after that. - The
Open
column should not contain anything that would be just “nice to have”.
- Make sure the
-
The CKI Kanban issue board with a
Epic = None
filter can be used to inspect standalone issues. While it is ok to have such issues in theCWF::Stage::On Deck
andCWF::Stage::In Progress
colums, (and the board can be used to track them across their life cycle), consider attaching them to an epic for the current quarter to make it easier to keep track of them. -
The incident board shows tickets related to incidents.
-
Aggressively purge the
CWF::Incident::Active
andCWF::Incident::Mitigated
columns. It doesn’t make any sense to have an incident sitting in these columns for a longer time. If that is the case, most likely the ticket is just referring to a “normal” bug, and should be relabeled asCWF::Type::CKI
. -
To a lesser extend, the
CWF::Incident::Resolved
column should be purged as well, e.g. by attaching tickets to a planned epic and converting them to normal issues. This ensures that the incident board shows meaningful actionable information.
-
After the planning is done, add the In Progress
and On Deck
epics to the
Canvas of the #team-kernel-cki Slack channel.
Handling incidents
Next to the short-term components such as the immediate mitigation and resolution of an incident itself, an incident response also has to contain strategic improvements to prevent recurrence.
When handling incidents, focus on the following aspects depending on the incident state:
CWF::Incident::Active
: reduce the impact on the production environmentCWF::Incident::Mitigated
: resolve the direct cause of the incidentCWF::Incident::Resolved
: improve on the root cause of the incident, e.g. by improving- monitoring/alerting of the conditions that led to the incident
- monitoring/alerting for a similar incident
- logging to aid in faster detection/recovery for a similar incident
- documentation
- the underlying code and/or architecture
Working with GitLab issues
Creating a new issue
Create a new GitLab issue, e.g.
- on the top bar on a project page, select the plus sign (
+
) and then, underThis project
, selectNew issue
from the top bar ("+") - on the left sidebar on a project page, select
Issues
and then, in the upper-right corner, selectNew Issue
- on a project page, press the
i
shortcut - on any of the GitLab issue boards, select the appropriate list
menu (
⋮
) and thenCreate new issue
Transitioning an issue between the different stages
To transition an issue to a different stage (<STAGE>
), e.g.
- on the right sidebar on an issue page, select
Edit
next toLabels
, and then select the appropriateCWF::Stage::<STAGE>
label - in the comment box on an issue page (
e
shortcut), write/label ~"CWF::Stage::<STAGE>"
and submit the comment - on any of the GitLab issue boards, drag the issue card to the appropriate list
Closing an issue
Close the GitLab issue, e.g.
- at the top of an issue page, select
Close issue
- in the comment box on an issue page (
e
shortcut), write/close
and submit the comment - on any of the GitLab issue boards, drag the issue card to the
Closed
list
Working with incident issues
Creating a new incident issue
Create a new GitLab issue, e.g.
- on the top bar on a project page, select the plus sign (
+
) and then, underThis project
, selectNew issue
from the top bar ("+") - on the left sidebar on a project page, select
Issues
and then, in the upper-right corner, selectNew Issue
- on a project page, press the
i
shortcut - on the incident issue board, select the appropriate list
menu (
⋮
) and thenCreate new issue
Unless the issue was created through the incident issue
board, make sure to tag the issue with at least
CWF::Type::Incident
.
Converting between an incident issue and a normal issue
To convert a normal issue into an incident issue, add the CWF::Type::Incident
label to it, e.g.
- on the right sidebar on an issue page, select
Edit
next toLabels
, and then select theCWF::Type::Incident
label - in the comment box on an issue page (
e
shortcut), write/label ~"CWF::Type::Incident"
and submit the comment
To convert an incident issue into a normal issue, remove the
CWF::Type::Incident
label from it, e.g.
- on the right sidebar on an issue page, select
Edit
next toLabels
, and then deselect theCWF::Type::Incident
label - in the comment box on an issue page (
e
shortcut), write/unlabel ~"CWF::Type::Incident"
and submit the comment
Transitioning an incident issue between the different phases
To transition an incident issue to a different phase (<PHASE>
), e.g.
- on the right sidebar on an issue page, select
Edit
next toLabels
, and then select the appropriateCWF::Incident::<PHASE>
label - in the comment box on an issue page (
e
shortcut), write/label ~"CWF::Incident::<PHASE>"
and submit the comment - on the incident issue board, drag the issue card to the appropriate list
Closing an incident issue
Close the GitLab issue, e.g.
- at the top of an issue page, select
Close issue
- in the comment box on an issue page (
e
shortcut), write/close
and submit the comment - on the incident issue board, drag the issue card to the
Closed
list