We have started creating and storing CodeQL databases for the most popular open-source projects on GitHub.com. If you use CodeQL for security research, you can now obtain these databases easily and directly through the CodeQL extension for Visual Studio Code, which makes it much easier to write and run your own custom CodeQL queries.
Using CodeQL for security research
The CodeQL engine powers GitHub code scanning: it analyses source code and flags up potential security problems (for example, in pull requests). By default, code scanning runs a large set of open source queries that are able to identify the most important and common security problems.
CodeQL is also a powerful tool for variant analysis and other types of security research. CodeQL treats source code as data, and anyone can write custom CodeQL queries to explore a codebase and identify vulnerabilities. Like code search on steroids!
The first step of any CodeQL analysis is extracting the source code into a CodeQL database. This database contains a relational representation of the source code — including elements like the abstract syntax tree, the data flow graph, and the control flow graph. You can create CodeQL databases yourself using the CodeQL CLI, but with the feature we shipped today, it's much quicker to get started: you can download a ready-built CodeQL database from GitHub.com.
To download a CodeQL database for use in the CodeQL extension in VS Code:
- Make sure you have set up the CodeQL extension for VS Code. For more information, see Setting up CodeQL in Visual Studio Code.
- Open the CodeQL databases view in the extension.
- Hover over the sidebar, click the GitHub icon, and specify the
owner/repo
identifier of the public repository you'd like to analyze.
Once you've downloaded a CodeQL database, you're ready to start your research. Find more information in the CodeQL documentation.
FAQs
How many CodeQL databases are available?
We currently store databases for over 200,000 repositories on GitHub.com. That list is constantly growing and evolving to make sure that it includes the most interesting codebases for security research.
What languages are can you download CodeQL databases for?
We create and store databases for all of the languages that we support in CodeQL code scanning. For more information, see About code scanning with CodeQL.
Can I download CodeQL databases outside VS Code?
Yes, you can also download CodeQL databases using the GitHub REST API. For more information, see Downloading databases from GitHub.com in the CodeQL CLI documentation.
Why is there no CodeQL codebase available for my favourite open source repository?
If there is a repository that you'd like to analyze, but a CodeQL database is not available yet, then you can trigger the creation (and storing) of a database by enabling GitHub code scanning with the CodeQL engine. Alternatively, you could fork the repository and enable code scanning on the fork. For more information, see the code scanning documentation.