Release
• 3 minute read

New dataflow API for writing custom CodeQL queries

Summary

We have released a new API for people who write custom CodeQL queries which make use of dataflow analysis. The new API offers additional flexibility, improvements that prevent common pitfalls…

We have released a new API for people who write custom CodeQL queries which make use of dataflow analysis. The new API offers additional flexibility, improvements that prevent common pitfalls with the old API, and improves query evaluation performance by 5%. Whether you’re writing CodeQL queries for personal interest, or are participating in the bounty programme to help us secure the world’s code: this post will help you move from the old API to the new one.

This API change is relevant only for users who write their own custom CodeQL queries. Code scanning users who use GitHub’s standard CodeQL query suites will not need to make any changes.

With the introduction of the new dataflow API, the old API will be deprecated. The old API will continue to work until December 2024; the CodeQL CLI will start emitting deprecation warnings in December 2023.

To demonstrate how to update CodeQL queries from the old to the new API, consider this example query which uses the soon-to-be-deprecated API:

class SensitiveLoggerConfiguration extends TaintTracking::Configuration {
  SensitiveLoggerConfiguration() { this = "SensitiveLoggerConfiguration" } // 6: characteristic predicate with dummy string value (see below)

  override predicate isSource(DataFlow::Node source) { source.asExpr() instanceof CredentialExpr }

  override predicate isSink(DataFlow::Node sink) { sinkNode(sink, "log-injection") }

  override predicate isSanitizer(DataFlow::Node sanitizer) {
    sanitizer.asExpr() instanceof LiveLiteral or
    sanitizer.getType() instanceof PrimitiveType or
    sanitizer.getType() instanceof BoxedType or
    sanitizer.getType() instanceof NumberType or
    sanitizer.getType() instanceof TypeType
  }

  override predicate isSanitizerIn(DataFlow::Node node) { this.isSource(node) }
}

import DataFlow::PathGraph

from SensitiveLoggerConfiguration cfg, DataFlow::PathNode source, DataFlow::PathNode sink
where cfg.hasFlowPath(source, sink)
select sink.getNode(), source, sink, "This $@ is written to a log file.",
 source.getNode(),
  "potentially sensitive information"

To convert the query to the new API:

  1. You use a module instead of a class. A CodeQL module does not extend anything, it instead implements a signature. For both data flow and taint tracking configurations this is DataFlow::ConfigSig or DataFlow::StateConfigSigif FlowState is needed.
  2. Previously, you would choose between data flow or taint tracking by extending DataFlow::Configuration or TaintTracking::Configuration. Instead, now you define your data or taint flow by instantiating either the DataFlow::Global<..> or TaintTracking::Global<..> parameterized modules with your implementation of the shared signature and this is where the choice between data flow and taint tracking is made.
  3. Predicates no longer override anything, because you are defining a module.
  4. The concepts of sanitizers and barriers are now unified under isBarrier and it applies to both taint tracking and data flow configurations. You must use isBarrier instead of isSanitizer and isBarrierIn instead of isSanitizerIn.
  5. Similarly, instead of the taint tracking predicate isAdditionalTaintStep you use isAdditionalFlowStep .
  6. A characteristic predicate with a dummy string value is no longer needed.
  7. Do not use the generic DataFlow::PathGraph. Instead, the PathGraph will be imported directly from the module you are using. For example, SensitiveLoggerFlow::PathGraph in the updated version of the example query below.
  8. Similar to the above, you’ll use the PathNode type from the resulting module and not from DataFlow.
  9. Since you no longer have a configuration class, you’ll use the module directly in the from and where clauses. Instead of using e.g. cfg.hasFlowPath or cfg.hasFlow from a configuration object cfg, you’ll use flowPath or flow from the module you’re working with.

Taking all of the above changes into account, here’s what the updated query looks like:

module SensitiveLoggerConfig implements DataFlow::ConfigSig {  // 1: module always implements DataFlow::ConfigSig or DataFlow::StateConfigSig
  predicate isSource(DataFlow::Node source) { source.asExpr() instanceof CredentialExpr } // 3: no need to specify 'override'
  predicate isSink(DataFlow::Node sink) { sinkNode(sink, "log-injection") }

  predicate isBarrier(DataFlow::Node sanitizer) {  // 4: 'isBarrier' replaces 'isSanitizer'
    sanitizer.asExpr() instanceof LiveLiteral or
    sanitizer.getType() instanceof PrimitiveType or
    sanitizer.getType() instanceof BoxedType or
    sanitizer.getType() instanceof NumberType or
    sanitizer.getType() instanceof TypeType
  }

  predicate isBarrierIn(DataFlow::Node node) { isSource(node) } // 4: isBarrierIn instead of isSanitizerIn

}

module SensitiveLoggerFlow = TaintTracking::Global<SensitiveLoggerConfig>; // 2: TaintTracking selected 

import SensitiveLoggerFlow::PathGraph  // 7: the PathGraph specific to the module you are using

from SensitiveLoggerFlow::PathNode source, SensitiveLoggerFlow::PathNode sink  // 8 & 9: using the module directly
where SensitiveLoggerFlow::flowPath(source, sink)  // 9: using the flowPath from the module 
select sink.getNode(), source, sink, "This $@ is written to a log file.", source.getNode(),
  "potentially sensitive information"

While not covered in this example, you can also implement the DataFlow::StateConfigSig signature if flow-state is needed. You then instantiate DataFlow::GlobalWithState or TaintTracking::GlobalWithState with your implementation of that signature. Another change specific to flow-state is that instead of using DataFlow::FlowState, you now define a FlowState class as a member of the module. This is useful for using types other than string as the state (e.g. integers, booleans). An example of this implementation can be found here.

This functionality is available with CodeQL version 2.13.0. If you would like to get started with writing your own custom CodeQL queries, follow these instructions to get started with the CodeQL CLI and the VS Code extension.

New Releases

Improvements

Deprecations

Back to top