LIKE injection
Looking through our exception tracker the other day, I ran across a notice from our slow-query logger that caught my eye. I saw a SELECT … WHERE … LIKE query…
Looking through our exception tracker the other day, I ran across a notice from our slow-query logger that caught my eye. I saw a SELECT … WHERE … LIKE
query with lots of percent signs in the LIKE
clause. It was pretty obvious that this term was user-provided and my first thought was SQL injection.
[3.92 sec] SELECT ... WHERE (profiles.email LIKE '%64%68%6f%6d%65%73@%67%6d%61%69%6c.%63%6f%6d%') LIMIT 10
Looking at the code, it turned out that we were using a user-provided term directly in the LIKE
clause without any checks for metacharacters that are interpreted in this context (%
, _
, ).
def self.search(term, options = {})
limit = (options[:limit] || 30).to_i
friends = options[:friends] || []
with_orgs = options[:with_orgs].nil? ? false : options[:with_orgs]
if term.to_s.index("@")
users = User.includes(:profile)
.where("profiles.email LIKE ?", "#{term}%")
.limit(limit).to_a
else
users = user_query(term, friends: friends, limit: limit)
end
...
end
While this isn’t full-blown SQL injection, it got me thinking about the impact of this kind of injection. This kind of pathological query clearly has some performance impact because we logged a slow query. The question is how much?
I asked our database experts and was told that it depends on where the wildcard is in the query. With a %
in the middle of a query, the database can still check the index for the beginning characters of the term. With a %
at the start of the query, indices may not get used at all. This bit of insight led me to run several queries with varied %
placement against a test database.
mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "chris@github.com";
1 row in set (0.00 sec)
mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "%ris@github.com";
1 row in set (0.91 sec)
mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "chris@github%";
1 row in set (0.00 sec)
mysql> SELECT 1 FROM `profiles` WHERE `email` LIKE "%c%h%r%i%s%@%g%i%t%h%u%b%.%c%o%m%";
21 rows in set (0.93 sec)
It seems that unsanitized user-provided LIKE
clauses do have a potential performance impact, but how do we address this in a Ruby on Rails application? Searching the web, I couldn’t find any great suggestions. There are no Rails helpers for escaping LIKE
metacharacters, so we wrote some.
module LikeQuery
# Characters that have special meaning inside the `LIKE` clause of a query.
#
# `%` is a wildcard representing multiple characters.
# `_` is a wildcard representing one character.
# `` is used to escape other metacharacters.
LIKE_METACHARACTER_REGEX = /([\%_])/
# What to replace `LIKE` metacharacters with. We want to prepend a literal
# backslash to each metacharacter. Because String#gsub does its own round of
# interpolation on its second argument, we have to double escape backslashes
# in this String.
LIKE_METACHARACTER_ESCAPE = '\\1'
# Public: Escape characters that have special meaning within the `LIKE` clause
# of a SQL query.
#
# value - The String value to be escaped.
#
# Returns a String.
def like_sanitize(value)
raise ArgumentError unless value.respond_to?(:gsub)
value.gsub(LIKE_METACHARACTER_REGEX, LIKE_METACHARACTER_ESCAPE)
end
extend self
module ActiveRecordHelper
# Query for values with the specified prefix.
#
# column - The column to query.
# prefix - The value prefix to query for.
#
# Returns an ActiveRecord::Relation
def with_prefix(column, prefix)
where("#{column} LIKE ?", "#{LikeQuery.like_sanitize(prefix)}%")
end
# Query for values with the specified suffix.
#
# column - The column to query.
# suffix - The value suffix to query for.
#
# Returns an ActiveRecord::Relation
def with_suffix(column, suffix)
where("#{column} LIKE ?", "%#{LikeQuery.like_sanitize(suffix)}")
end
# Query for values with the specified substring.
#
# column - The column to query.
# substring - The value substring to query for.
#
# Returns an ActiveRecord::Relation
def with_substring(column, substring)
where("#{column} LIKE ?", "%#{LikeQuery.like_sanitize(substring)}%")
end
end
end
ActiveRecord::Base.extend LikeQuery
ActiveRecord::Base.extend LikeQuery::ActiveRecordHelper
We then went through and audited all of our LIKE
queries, fixing eleven such cases. The risk of these queries turned out to be relatively low. A user could subvert the intention of the query, though not in any meaningful way. For us, this was simply a Denial of Service (DoS) vector. It’s nothing revolutionary and it is not a new vulnerability class, but it’s something to keep an eye out for. Three second queries can be a significant performance hit and application-level DoS vulnerabilities need to be mitigated.
Update: A number of people pointed out that in Rails 4.2, ActiveRecord includes a sanitize_sql_like
helper for sanitizing LIKE clauses.
Written by
Related posts
Unlocking the power of unstructured data with RAG
Unstructured data holds valuable information about codebases, organizational best practices, and customer feedback. Here are some ways you can leverage it with RAG, or retrieval-augmented generation.
GitHub Availability Report: May 2024
In May, we experienced one incident that resulted in degraded performance across GitHub services.
How we improved push processing on GitHub
Pushing code to GitHub is one of the most fundamental interactions that developers have with GitHub every day. Read how we have significantly improved the ability of our monolith to correctly and fully process pushes from our users.