Why and how GitHub encrypts sensitive database columns using ActiveRecord::Encryption
You may know that GitHub encrypts your source code at rest, but you may not have known that we encrypt sensitive database columns as well. Read about our column encryption strategy and our decision to adopt the Rails column encryption standard.
You may know that GitHub encrypts your source code at rest, but you may not have known that we also encrypt sensitive database columns in our Ruby on Rails monolith. We do this to provide an additional layer of defense in depth to mitigate concerns, such as:
- Reading or tampering with sensitive fields if a database is inappropriately accessed
- Accidentally exposing sensitive data in logs
Motivation
Until recently, we used an internal library called Encrypted Attributes. GitHub developers would declare a column should be encrypted using an API that might look familiar if you have used ActiveRecord::Encryption
:
class TotpAppRegistration
encrypted_attribute :encrypted_otp_secret, :plaintext_otp_secret
end
Given that we had an existing implementation, you may be wondering why we chose to take on the work of converting our columns to ActiveRecord::Encryption
. Our main motivation was to ensure that developers did not have to learn a GitHub-specific pattern to encrypt their sensitive data.
We believe strongly that using familiar, intuitive patterns results in better adoption of security tools and, by extension, better security for our users.
In addition to exposing some of the implementation details of the underlying encryption, this API did not provide an easy way for developers to encrypt existing columns. Our internal library required a separate encryption key to be generated and stored in our secure environment variable configuration—for each new database column. This created a bottleneck, as most developers don’t work with encryption every day and needed support from the security team to make changes.
When assessing ActiveRecord::Encryption
, we were particularly interested in its ease of use for developers. We wanted a developer to be able to write one line of code, and no matter if their column was previously plaintext or used our previous solution, their column would magically start using ActiveRecord::Encryption
. The final API looks something like this:
class TotpAppRegistration
encrypts :encrypted_otp_secret
end
This API is the exact same as what is used by traditional ActiveRecord::Encryption
while hiding all the complexity of making it work at GitHub scale.
How we implemented this
As part of implementing ActiveRecord::Encryption
into our monolith, we worked with our architecture and infrastructure teams to make sure the solution met GitHub’s scalability and security requirements. Below is a brief list of some of the customizations we made to fit the implementation to our infrastructure.
As always, there are specific nuances that must be considered when modifying existing encryption implementations, and it is always a good practice to review any new cryptography code with a security team.
Secure primary key storage
By default, Rails uses its built-in credentials.yml.enc file to securely store the primary key and static salt used for deriving the column encryption key in ActiveRecord::Encryption
.
GitHub’s key management strategy for ActiveRecord::Encryption
differs from the Rails default in two key ways: deriving a separate key per column and storing the key in our centralized secret management system.
Deriving per-column keys from a single primary key
As explained above, one of the goals of this transition was to no longer bottleneck teams by managing keys manually. We did, however, want to maintain the security properties of separate keys. Thankfully, cryptography experts have created a primitive known as a Key Derivation Function (KDF) for this purpose. These functions take (roughly) three important parameters: the primary key, a unique salt, and a string termed “info” by the spec.
Our salt is simply the table name, an underscore, and the attribute name. So for TotpAppRegistrations#encrypted_otp_secret
the salt would be totp_app_registrations_encrypted_otp_secret
. This ensures the key is different per column.
Due to the specifics of the ActiveRecord::Encryption
algorithm (AES256-GCM), we need to be careful not to encrypt too many values using the same key (to avoid nonce reuse). We use the “info” string parameter to ensure the key for each column changes automatically at least once per year. Therefore, we can populate the info input with the current year as a nonce during key derivation.
The applications that make up GitHub store secrets in Hashicorp Vault. To conform with this pre-existing pattern, we wanted to pull our primary key from Vault instead of the credentials.yml.enc file. To accommodate for this, we wrote a custom key provider that behaves similarly to the default DerivedSecretKeyProvider
, retrieving the key from Vault and deriving the key with our KDF (see Diagram 1).
Making new behavior the default
One of our team’s key principles is that solutions we develop should be intuitive and not require implementation knowledge on the part of the product developer. ActiveRecord::Encryption includes functionality to customize the Encryptor used to encrypt data for a given column. This functionality would allow developers to optionally use the strategies described above, but to make it the default for our monolith we needed to override the encrypts
model helper to automatically select an appropriate GitHub-specific key provider for the user.
{
def self.encrypts(*attributes, key_provider: nil, previous: nil, **options)
# snip: ensure only one attribute is passed
# ...
# pull out the sole attribute
attribute = attributes.sole
# snip: ensure if a key provider is passed, that it is a GitHubKeyProvider
# ...
# If no key provider is set, instantiate one
kp = key_provider || GitHub::Encryption::GitHubKeyProvider.new(table: table_name.to_sym, attribute: attribute)
# snip: logic to ensure previous encryption formats and plaintext are supported for smooth transition (see part 2)
# github_previous = ...
# call to rails encryption
super(attribute, key_provider: kp, previous: github_previous, **options)
end
}
Currently, we only provide this API to developers working on our internal github.com
codebase. As we work with the library, we are experimenting with upstreaming this strategy to ActiveRecord::Encryption
by replacing the per-class encryption scheme with a per-column encryption scheme.
Turn off compression by default
Compressing values prior to encryption can reveal some information about the content of the value. For example, a value with more repeated bytes, such as “abcabcabc,” will compress better than a string of the same length, such as “abcdefghi”. In addition to the common encryption property that ciphertext generally exposes the length, this exposes additional information about the entropy (randomness) of the underlying plaintext.
ActiveRecord::Encryption
compresses data by default for storage efficiency purposes, but since the values we are encrypting are relatively small, we did not feel this tradeoff was worth it for our use case. This is why we replaced the default to compress values before encryption with a flag that makes compression optional.
Migrating to a new encryption standard: the hard parts
This post illustrates some of the design decisions and tradeoffs we encountered when choosing ActiveRecord::Encryption, but it’s not quite enough information to guide developers of existing applications to start encrypting columns. In the next post in this series we’ll show you how we handled the hard parts—how to upgrade existing columns in your application from plaintext or possibly another encryption standard.
Tags:
Written by
Related posts
Unlocking the power of unstructured data with RAG
Unstructured data holds valuable information about codebases, organizational best practices, and customer feedback. Here are some ways you can leverage it with RAG, or retrieval-augmented generation.
GitHub Availability Report: May 2024
In May, we experienced one incident that resulted in degraded performance across GitHub services.
How we improved push processing on GitHub
Pushing code to GitHub is one of the most fundamental interactions that developers have with GitHub every day. Read how we have significantly improved the ability of our monolith to correctly and fully process pushes from our users.