GitHub Availability Report: February 2024
In February, we experienced two incidents that resulted in degraded performance across GitHub services.
In February, we experienced two incidents that resulted in degraded performance across GitHub services.
February 26 18:34 UTC (lasting 53 minutes)
February 29 09:32 UTC (lasting 142 minutes)
On February 26 and February 29, we had two incidents related to a background job service that caused processing delays to GitHub services. The incident on February 26 lasted for 63 minutes, while the incident on February 28 lasted for 142 minutes.
The incident on February 26 was related to capacity constraints with our job queuing service and a failure of our automated failover system. Users experienced delays in Webhooks, GitHub Actions, and UI updates (for example, a delay in UI updates on pull requests). We mitigated the incident by manually failing over to our secondary cluster. No data was lost in the process.
The incident on February 29 also caused processing delays to Webhooks, GitHub Actions and GitHub Issues services, with 95% of the delays occurring in a 22-minute window between 11:05 and 11:27 UTC. At 9:32 UTC, our automated failover successfully routed traffic, but an improper restoration to the primary at 10:32 UTC caused a significant increase in queued jobs until a correction was made at 11:21 UTC and healthy services began burning down the backlog until full restoration at 11:27 UTC.
To prevent recurrence of the incidents in the short term, we have completed three significant improvements in the areas of better automation, increasing the reliability of our fallback process, and expanding the capacity of our background job queuing services based on these two incidents. For the longer term, we have a more significant effort already in progress to improve the overall scalability and reliability of our job processing platform.
Please follow our status page for real-time updates on status changes and post-incident recaps. To learn more about what we’re working on, check out the GitHub Engineering Blog.
Tags:
Written by
Related posts
GitHub and JFrog partner to unify code and binaries for DevSecOps
This partnership between GitHub and JFrog enables developers to manage code and binaries more efficiently on two of the most widely used developer platforms in the world.
2024 GitHub Accelerator: Meet the 11 projects shaping open source AI
Announcing the second cohort, delivering value to projects, and driving a new frontier.
Introducing GitHub Copilot Extensions: Unlocking unlimited possibilities with our ecosystem of partners
The world of Copilot is getting bigger, improving the developer experience by keeping developers in the flow longer and allowing them to do more in natural language.