Performance at GitHub: deferring stats with rack.after_reply
How we sped up GitHub.com by moving slow, non-critical code into rack.after_reply.
Performance is an essential aspect of any production application, and GitHub is no exception. In the last year, we’ve been making significant investments to improve the performance of some of our most critical workflows. There’s a lot to share, but today we’re going to focus on a little-known feature of some Rack-powered web servers called rack.after_reply
that saved us 30ms p50 and 50ms+ p99 on every request across the site.
The problem
First, let me talk about how we found ourselves looking at this Rack web server functionality. When diving into performance, we often look at flamegraphs that reveal where time is being spent for a given request. Eventually, we started to notice a pattern. It turned out we were spending a significant amount of time sending statsd metrics throughout a request. In fact, it could be up to 65ms per request! While telemetry enables us to improve GitHub, it doesn’t help users directly, especially when it impacts the performance of our page loads.
There’s no need to send telemetry immediately, especially if it’s expensive, so we started discussing ways to remove telemetry network calls out of the critical path. This resulted in the question, “Can we batch all of our telemetry calls and send them at once?” There’s far too much data to use a background job, so we had to look at other potential solutions. We came across Rack::Events and spiked out a proof of concept. Unfortunately, this approach caused any work performed in Rack::Events
callbacks to block the response from closing. This resulted in two issues. First, users would see the browser loading indicator until the deferred code was complete. Second, this impacted our metrics since response times still included the time it took to run our deferred code.
A potential path forward
Recognizing the potential, we kept digging. Eventually, we came across a feature that seemed to do exactly what we wanted, rack.after_reply
. The Puma source code describes rack.after_reply
as: “A rack extension. If the app writes #call’ables to this array, we will invoke them when the request is done.” In other words, this would allow us to collect telemetry metrics during a request and flush them in a single batch after the browser received the full response, avoiding the issue of network calls slowing down response times. There was a problem though. We don’t use Puma at GitHub, we use Unicorn. 😱
We had a clear path forward. All we needed to do was implement rack.after_reply
in Unicorn. This functionality has a lot of use cases outside of sending telemetry metrics, so we contributed rack.after_reply
back to Unicorn. This allowed us to write a small wrapper class around rack.after_reply
, making it easier to use in the application:
class AfterResponse
def initialize(env)
env[“rack.after_reply”] ||= []
env[“rack.after_reply”] << -> do
self.call
end
@to_perform = []
end
# Calls each callable defined via #perform
def call
@to_perform.each do |block|
begin
block.call(self)
rescue Object => e
Rails.logger.error(e)
end
end
end
# Adds given block to the array of callables that will
# be called after the user has received the response.
def perform(name, &block)
@to_perform << block
end
end
With the hard part completed, we wrote a small wrapper around our telemetry class to buffer all of our stats data. Combining that with a middleware that performs roughly the following, users now receive responses 30ms faster P50 across all pages. P99 response times improved too, with a 50ms+ reduction across the site.
GitHub.after_response.perform do
GitHub.statsd.flush!
end
In conclusion
There are a few drawbacks to this approach that are worth mentioning. The primary disadvantage is that a Unicorn worker executes the rack.after_reply
code, making it unable to serve another HTTP request until your callables finish running. If not kept in check, this can potentially increase HTTP queueing, the time spent waiting for a web server worker to serve a request. We mitigated this by adding timeout behavior to prevent any code from running too long.
It’s a little early to talk about it, but there is progress toward making this functionality an official part of the Rack spec. A recently released gem also implements additional functionality around this feature called maybe_later that’s worth checking out.
Overall, we’re pleased with the results. Unicorn-powered applications now have a new tool at their disposal, and our customers reap the benefits of this performance enhancement!
Tags:
Written by
Related posts
Unlocking the power of unstructured data with RAG
Unstructured data holds valuable information about codebases, organizational best practices, and customer feedback. Here are some ways you can leverage it with RAG, or retrieval-augmented generation.
GitHub Availability Report: May 2024
In May, we experienced one incident that resulted in degraded performance across GitHub services.
How we improved push processing on GitHub
Pushing code to GitHub is one of the most fundamental interactions that developers have with GitHub every day. Read how we have significantly improved the ability of our monolith to correctly and fully process pushes from our users.