Behind GitHub’s new authentication token formats
We’re excited to share a deep dive into how our new authentication token formats are built and how these improvements are keeping your tokens more secure. As we continue to…
We’re excited to share a deep dive into how our new authentication token formats are built and how these improvements are keeping your tokens more secure. As we continue to focus on the security of our platform and services across the web, this update shows how big an impact simple changes can have.
Many of our old authentication token formats are hex-encoded 40 character strings that are indistinguishable from other encoded data like SHA hashes. These have several limitations, such as inefficient or even inaccurate detection of compromised tokens for our secret scanning feature. We continually strive for security excellence, so we knew that token detection was something we wanted to improve. How could we make our tokens easier to identify and more secure?
Without further ado, here are the design decisions behind our new authentication token formats that let us meet both goals.
Identifiable prefixes
As we see across the industry from companies like Slack and Stripe, token prefixes are a clear way to make tokens identifiable. We are including specific 3 letter prefixes to represent each token, starting with a company signifier, gh
, and the first letter of the token type. The results are:
ghp
for GitHub personal access tokensgho
for OAuth access tokensghu
for GitHub user-to-server tokensghs
for GitHub server-to-server tokensghr
for refresh tokens
Additionally, we want to make these prefixes clearly distinguishable within the token to improve readability. Thus, we are adding a separator: _
. An underscore is not a Base64 character which helps ensure that our tokens cannot be accidentally duplicated by randomly generated strings like SHAs.
One other neat thing about _
is it will reliably select the whole token when you double click on it. Other characters we considered are sometimes included in application word separators and thus will stop highlighting at that character. Try out double clicking this-random-text
versus this_random_text
!
With this prefix alone, we anticipate the false positive rate for secret scanning will be down to 0.5%.⚡
Checksum
Identifiable prefixes are great, but let’s go one step further. A checksum virtually eliminates false positives for secret scanning offline. We can check the token input matches the checksum and eliminate fake tokens without having to hit our database.
A 32 bit checksum in the last 6 digits of each token strikes the optimal balance between keeping the random token portion at a consistent entropy and enough confidence in the checksum. We start the implementation with a CRC32 algorithm, a standard checksum algorithm. We then encode the result with a Base62 implementation, using leading zeros for padding as needed.
Token entropy
We of course can’t forget about token entropy. Entropy is a logarithmic measure of information or uncertainty inherent in the possible token combinations. We use it as a representation of uniqueness for a given pattern and it’s important to maintain for the vast number of tokens we generate everyday. For personal access tokens alone, we create over 10k on a slow day and upwards of 18k on peak days. With our new formats, not only did we maintain our previous levels — we increased them!
Previously, our implementation for OAuth access tokens had an entropy of 160:
Math.log(((“a”..“f”).to_a + (0..9).to_a).length)/Math.log(2) * 40 = 160
Our implementation for OAuth access tokens are now 178:
Math.log(((“a”..“z”).to_a + (“A”..“Z”).to_a + (0..9).to_a).length)/Math.log(2) * 30 = 178
As we continue to grow and move forward, we will increase this entropy even more. But for now, we are thrilled our tokens have increased identifiability, security, and entropy — all without changing the token length.
What does this mean for you?
As a GitHub user…
We strongly encourage you to reset any personal access tokens and OAuth tokens you have. These improvements help secret scanning detection and will help you mitigate any risk to compromised tokens. You can reset your personal access tokens by going to developer settings and your OAuth tokens with our API.
As a service provider…
If you issue tokens as part of your platform and aren’t part of our secret scanning feature, we encourage you to follow the guidelines we outline here for your own tokens and join our secret scanning program so we can keep your tokens secure too.
We thank you for helping us make our platform and services the best and most secure they can be.✨
Tags:
Written by
Related posts
Unlocking the power of unstructured data with RAG
Unstructured data holds valuable information about codebases, organizational best practices, and customer feedback. Here are some ways you can leverage it with RAG, or retrieval-augmented generation.
GitHub Availability Report: May 2024
In May, we experienced one incident that resulted in degraded performance across GitHub services.
How we improved push processing on GitHub
Pushing code to GitHub is one of the most fundamental interactions that developers have with GitHub every day. Read how we have significantly improved the ability of our monolith to correctly and fully process pushes from our users.