Today’s Outage
A few hours ago I was upgrading our continuous integration setup when a configuration error caused it to run against our production environment rather than our testing environment. Before every…
A few hours ago I was upgrading our continuous integration setup when a configuration error caused it to run against our production environment rather than our testing environment.
Before every run of our test suite we destroy then re-create the database so that we have a known, clean starting point. This also allows us to continuously integrate topic branches with potentially different database schemas. Due to the configuration error GitHub’s production database was destroyed then re-created. Not good.
We immediately began restoring the database from our most recent backup. Unfortunately, while most tables in the GitHub database are small, our “events” table is large. This significantly slowed the restoration process.
Eventually the decision was made to skip the events table in order to speed up the restoration process. As a result, your dashboard and profile might currently be blank – rather annoying, but hopefully only temporary. We will be restoring the events table bit by bit over the next few days in an attempt to minimize downtime.
Worse, however, is that we may have lost some data from between the last good database backup and the time of the deletion. Newly created users and repositories are being restored, but pull request state changes and similar might be gone.
Obviously, this should have never had happened. It should be very difficult to cause a database failure like this and very easy to recover from it.
Our plan moving forward:
- Completely isolate the test environment from the production environment, i.e. make production hosts unreachable from the testing VM.
- Reduce the size and growth rate of our events table. This is already well underway but is now one of our top priorities.
- Begin storing binlogs to reduce data loss in the event of a future db restoration. The completion of #2 will help make this much easier.
We’re very sorry about this, especially if we ruined your work day or Sunday afternoon. Please email support@github.com if you are still having problems or need to discuss the outage further.
Written by
Related posts
Apply now for GitHub Universe 2023 micro-mentoring
As part of our ongoing commitment to accelerate human progress through Social Impact initiatives, we’re offering students 30-minute, 1:1 micro-mentoring sessions with GitHub employees ahead of Universe.
The 2023 Open Source Program Office (OSPO) Survey is live!
Help quantify the state of enterprise open source by taking the 2023 OSPO survey.
Godot 4.0 Release Party 🎉
We are delighted to host the Godot 4.0 Release Party at GitHub HQ on Wednesday, March 22 from 6:30 pm to 9:30 pm. And you’re invited!