GitHub Availability Report: October 2023
In October, we experienced two incidents that resulted in degraded performance across GitHub services.
In October, we experienced two incidents that resulted in degraded performance across GitHub services.
October 17 10:59 UTC (lasting 2 hours and 49 minutes)
From 10:59 UTC to 13:48 UTC on October 17, GitHub Codespaces service was degraded due to an outage in authentication. This issue impacted 67% of users over this time period, with users seeing failures to create and start their Codespaces. The regional authentication layer experienced throttling with a global third-party dependency due to increased load from onboarding a new Codespaces region. The Codespaces team mitigated manually by reducing load on the external dependency. Following the incident, the Codespaces team is actively evaluating and implementing scaling improvements to make the service more resilient to increasing demands. These include implementing regional-level caching to minimize calls to the dependency and incorporating measures to ensure the continued health of the authentication service in the event of errors.
October 25 09:13 UTC (lasting 3 hours and 27 minutes cumulatively)
On October 25 through 26, GitHub Copilot experienced multiple short and partial outages which affected code completions.
GitHub Copilot completions are currently hosted in multiple regions globally. Users are typically routed to the nearest geographic region, but may be routed to other regions when the nearest region is unhealthy. Beginning at 09:13 UTC on October 25, GitHub Copilot began experiencing partial outages of individual regions, lasting approximately 12 minutes per region. These outages were due to the nodes hosting the completion model being upgraded by an automated process, and a subset of GitHub Copilot users experienced completion errors during this timeframe. The issue was fully resolved at 02:40 UTC on October 26.
In order to prevent similar outages from happening in the future, we have taken steps to disable the automated upgrade behavior that we identified as the root cause, as well as prioritizing improvements to our global load balancing during regional outages.
Please follow our status page for real-time updates on status changes. To learn more about what we’re working on, check out the GitHub Engineering Blog.
Tags:
Written by
Related posts
GitHub and JFrog partner to unify code and binaries for DevSecOps
This partnership between GitHub and JFrog enables developers to manage code and binaries more efficiently on two of the most widely used developer platforms in the world.
2024 GitHub Accelerator: Meet the 11 projects shaping open source AI
Announcing the second cohort, delivering value to projects, and driving a new frontier.
Introducing GitHub Copilot Extensions: Unlocking unlimited possibilities with our ecosystem of partners
The world of Copilot is getting bigger, improving the developer experience by keeping developers in the flow longer and allowing them to do more in natural language.