PSI escalation procedure
When PSI infrastructure fails, problems should be escalated in a structured way.
File a SNOW ticket:
- Search for ‘PnT report an issue’
- Impact: 3 - Affects multiple teams
- Urgency: 2 - No workaround; blocks business-critical processes
- Application: DevOps - PSI-OCP (or correct application)
- Support group: Openshift PNT (or correct group)
If after an hour, no response on SNOW ticket
- Poke someone on the exd-infra-escalation Google Chat channel
This flow should only be used for real problems and not one-off failures. A good time frame is to verify that a problem is occurring consistently for ~15mins (and verify it’s really caused by PSI OpenShift) before submitting the initial ticket/pinging people on the Google Chat channel.
Make sure to also add the outage to the outage spreadsheet.Last modified December 2, 2021: Add reproducer docs for builds (5ffe38d)