Sustained workspace performance degradation for April 11-21
Between April 11th and 22nd, Gitpod experienced a series of incidents that led to workspace performance degradation in all regions, with some users unable to start workspaces and experiencing data loss. The issues were caused by a myriad of factors including slower-than-anticipated disk IO, more accessible CPU not governed as expected, and lack of ability to limit disk IO and latency in workspace networking. To prevent similar incidents in the future, Gitpod is implementing changes such as risk assessment for changesets, improved rollback capabilities, better external and internal communication during incidents, and redesigned on-call processes.
Company
Gitpod
Date published
May 5, 2022
Author(s)
Kyle Brennan
Word count
3153
Language
English
Hacker News points
1