Meet Nodeload, the new Download server
I recently launched the first GitHub node.js project: nodeload. Nodeload replaces the ruby git archive download server. Basically, any time you download a tarball or zip file of any repository…
I recently launched the first GitHub node.js project: nodeload. Nodeload replaces the ruby git archive download server. Basically, any time you download a tarball or zip file of any repository branch or tag, you’re going through nodeload.
This is a challenging issue because we can’t just store downloads anywhere. Either we pre-build archives for every single branch and tag after every repo commit, or we generate each one fresh. Of course, there is a healthy hot cache of built archives to deal with popular requests.
In the old system, requests to download an archive would create a Resque job. This Resque job actually runs a git archive
command on the archive server, pulling data from one of the file servers. Then, the original request sends you to a small ruby Sinatra app to wait for the job. It would essentially just check for the existence of a memcache flag before redirecting to the final download address. At the time, we were running about 3 Sinatra instances and 3 Resque workers.
This is a pragmatic solution when time and manpower are your constraints. I saw it as a great opportunity for Node.js. This is a simple feature that doesn’t depend on much of the site. In fact, the extra moving parts make it seem more complicated than it really is. The very nature of serving fresh git archives highlights the strengths of an event driven environment like Node.js, and the weaknesses of a blocking environment like Ruby.
However, I ran into a big roadblock right away: How do I even get the right file server for an archive? How can I authenticate whether a user has access to a private repo? Rather than reimplementing internal application logic, I went with a simple JSON API call. It returns the proper HTTP codes for missing or unauthorized repos, or a JSON object with the filesystem information. From there, nodeload spawns a git archive
process, and redirects the user when it’s finished. This way I was able to use the existing logic for user permissions and repository paths without duplicating logic or loading the GitHub app in ruby.
This is just the first iteration of the archive server. My ultimate goal is to get an address like “/mxcl/homebrew/tarball/master” to serve archives without any redirects. But, this first step runs stable, using less memory than a single Sinatra instance. I’m not at the point where I want to write entire applications with Node.js, but it feels like a great fit for certain cases like this.
One of the other bonuses of using Node.js: a socket.io web socket server for monitoring:
Written by
Related posts
Apply now for GitHub Universe 2023 micro-mentoring
As part of our ongoing commitment to accelerate human progress through Social Impact initiatives, we’re offering students 30-minute, 1:1 micro-mentoring sessions with GitHub employees ahead of Universe.
The 2023 Open Source Program Office (OSPO) Survey is live!
Help quantify the state of enterprise open source by taking the 2023 OSPO survey.
Godot 4.0 Release Party 🎉
We are delighted to host the Godot 4.0 Release Party at GitHub HQ on Wednesday, March 22 from 6:30 pm to 9:30 pm. And you’re invited!