We have released a new API for people who write custom CodeQL queries which make use of dataflow analysis. The new API offers additional flexibility, improvements that prevent common pitfalls with the old API, and improves query evaluation performance by 5%. Whether you’re writing CodeQL queries for personal interest, or are participating in the bounty programme to help us secure the world’s code: this post will help you move from the old API to the new one.
This API change is relevant only for users who write their own custom CodeQL queries. Code scanning users who use GitHub’s standard CodeQL query suites will not need to make any changes.
With the introduction of the new dataflow API, the old API will be deprecated. The old API will continue to work until December 2024; the CodeQL CLI will start emitting deprecation warnings in December 2023.
To demonstrate how to update CodeQL queries from the old to the new API, consider this example query which uses the soon-to-be-deprecated API:
class SensitiveLoggerConfiguration extends TaintTracking::Configuration {
SensitiveLoggerConfiguration() { this = "SensitiveLoggerConfiguration" } // 6: characteristic predicate with dummy string value (see below)
override predicate isSource(DataFlow::Node source) { source.asExpr() instanceof CredentialExpr }
override predicate isSink(DataFlow::Node sink) { sinkNode(sink, "log-injection") }
override predicate isSanitizer(DataFlow::Node sanitizer) {
sanitizer.asExpr() instanceof LiveLiteral or
sanitizer.getType() instanceof PrimitiveType or
sanitizer.getType() instanceof BoxedType or
sanitizer.getType() instanceof NumberType or
sanitizer.getType() instanceof TypeType
}
override predicate isSanitizerIn(DataFlow::Node node) { this.isSource(node) }
}
import DataFlow::PathGraph
from SensitiveLoggerConfiguration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "This $@ is written to a log file.",
source.getNode(),
"potentially sensitive information"
To convert the query to the new API:
- You use a
module
instead of a class
. A CodeQL module
does not extend
anything, it instead implements
a signature. For both data flow and taint tracking configurations this is DataFlow::ConfigSig
or DataFlow::StateConfigSig
if FlowState
is needed.
- Previously, you would choose between data flow or taint tracking by extending
DataFlow::Configuration
or TaintTracking::Configuration
. Instead, now you define your data or taint flow by instantiating either the DataFlow::Global<..>
or TaintTracking::Global<..>
parameterized modules with your implementation of the shared signature and this is where the choice between data flow and taint tracking is made.
- Predicates no longer
override
anything, because you are defining a module.
- The concepts of sanitizers and barriers are now unified under
isBarrier
and it applies to both taint tracking and data flow configurations. You must use isBarrier
instead of isSanitizer
and isBarrierIn
instead of isSanitizerIn
.
- Similarly, instead of the taint tracking predicate
isAdditionalTaintStep
you use isAdditionalFlowStep
.
- A characteristic predicate with a dummy string value is no longer needed.
- Do not use the generic
DataFlow::PathGraph
. Instead, the PathGraph
will be imported directly from the module you are using. For example, SensitiveLoggerFlow::PathGraph
in the updated version of the example query below.
- Similar to the above, you’ll use the
PathNode
type from the resulting module and not from DataFlow
.
- Since you no longer have a configuration class, you’ll use the module directly in the
from
and where
clauses. Instead of using e.g. cfg.hasFlowPath
or cfg.hasFlow
from a configuration object cfg
, you’ll use flowPath
or flow
from the module you’re working with.
Taking all of the above changes into account, here’s what the updated query looks like:
module SensitiveLoggerConfig implements DataFlow::ConfigSig { // 1: module always implements DataFlow::ConfigSig or DataFlow::StateConfigSig
predicate isSource(DataFlow::Node source) { source.asExpr() instanceof CredentialExpr } // 3: no need to specify 'override'
predicate isSink(DataFlow::Node sink) { sinkNode(sink, "log-injection") }
predicate isBarrier(DataFlow::Node sanitizer) { // 4: 'isBarrier' replaces 'isSanitizer'
sanitizer.asExpr() instanceof LiveLiteral or
sanitizer.getType() instanceof PrimitiveType or
sanitizer.getType() instanceof BoxedType or
sanitizer.getType() instanceof NumberType or
sanitizer.getType() instanceof TypeType
}
predicate isBarrierIn(DataFlow::Node node) { isSource(node) } // 4: isBarrierIn instead of isSanitizerIn
}
module SensitiveLoggerFlow = TaintTracking::Global<SensitiveLoggerConfig>; // 2: TaintTracking selected
import SensitiveLoggerFlow::PathGraph // 7: the PathGraph specific to the module you are using
from SensitiveLoggerFlow::PathNode source, SensitiveLoggerFlow::PathNode sink // 8 & 9: using the module directly
where SensitiveLoggerFlow::flowPath(source, sink) // 9: using the flowPath from the module
select sink.getNode(), source, sink, "This $@ is written to a log file.", source.getNode(),
"potentially sensitive information"
While not covered in this example, you can also implement the DataFlow::StateConfigSig
signature if flow-state is needed. You then instantiate DataFlow::GlobalWithState
or TaintTracking::GlobalWithState
with your implementation of that signature. Another change specific to flow-state is that instead of using DataFlow::FlowState
, you now define a FlowState class
as a member of the module. This is useful for using types other than string
as the state (e.g. integers, booleans). An example of this implementation can be found here.
This functionality is available with CodeQL version 2.13.0
. If you would like to get started with writing your own custom CodeQL queries, follow these instructions to get started with the CodeQL CLI and the VS Code extension.