Highlights from Git 2.28
The open source Git project just released Git 2.28 with features and bug fixes from over 58 contributors, 13 of them new. We last caught up with you on the…
The open source Git project just released Git 2.28 with features and bug fixes from over 58 contributors, 13 of them new. We last caught up with you on the latest in Git back when 2.26 was released. Here’s a look at some of the most interesting features and changes introduced since then.
Introducing init.defaultBranch
When you initialize a new Git repository from scratch with git init
, Git has always created an initial first branch with the name master
. In Git 2.28, a new configuration option, init.defaultBranch
is being introduced to replace the hard-coded term. (For more background on this change, this statement from the Software Freedom Conservancy is an excellent place to look).
Starting in Git 2.28, git init
will instead look to the value of init.defaultBranch
when creating the first branch in a new repository. If that value is unset, init.defaultBranch
defaults to master
. Here, it’s important to note that:
- This configuration variable can be set by the user, and overriding the default value is as easy as:
$ git config --global init.defaultBranch main
- This configuration variable only affects new repositories, and does not cause branches in existing projects to be renamed.
git clone
will also continue to respect theHEAD
of the repository you’re cloning from, so you won’t see a change in branch names until a maintainer initiates one.
This change supports the many communities, both on GitHub and in the wider Git community, who are considering renaming the default branch name of their repository from master
.
To learn more about the complementary changes GitHub is making, see github/renaming. GitLab and Bitbucket are also making similar changes.
[source]
Changed-path Bloom filters
In Git 2.27, the commit-graph
file format was extended to store changed-path Bloom filters. What does all of that mean? In a sense, this new information helps Git find points in history that touched a given path much more quickly (for example, git log -- <path>
, or git blame
). Git 2.28 takes advantage of these optimizations to deliver a handful of sizeable performance improvements.
Before we get into all of that, it’s worth taking a refresher through commit graphs whether you’re new to the concept, or familiar with them. (If you are familiar, and want to take a deeper dive, check out this blog post explaining all of the juicy technical details).
In the very simplest terms, the commit-graph
file stores information about commits. In essence, the commit-graph
acts like a cache for commonly-accessed information about commits: who their parent(s) are, what their root tree is, and things like that. It also stores computed information, too, like a commit’s generation number, and changed-path Bloom filters (more on that in just a moment).
Why store all of this information? To understand the answer to this, it is helpful to have a cursory understanding of how Git stores objects. Git stores objects in one of two ways: either as a loose object (in which case the object’s contents are stored in a single file unique to that object on disk), or as a packed object (in which case the object is assembled from a compressed format in a *.pack
file). No matter which way a commit is stored, we still have to parse and decompress it before its fields like “root tree” and “parents” can be accessed.
With a commit-graph
file, all of that information is immediate: for a given commit C
, Git knows exactly where to look in a commit-graph file for all of those fields that we store, and can read them off immediately, no decompression or piecing together required. This can shave some time off your usual Git operations by itself, but where the commit-graph
really shines is in the computed data it stores.
Generation numbers are a sort of reachability index that can help Git answer questions about things like reachability and topological ordering very quickly. Since generation numbers aren’t new in this release (and trying to explain them quickly would lose a lot of the benefit of a more careful exposition), I’ll refer you instead to this blog post by freshly-minted Hubber Derrick Stolee on the matter.
What’s new in 2.28?
OK, if you’ve made it this far, you’ve got a pretty good handle on what commit graphs are, and what they’re useful for. Now, let’s get to the juicy details. In Git 2.27, the commit-graph
file learned how to store changed-path Bloom filters. What are changed-path Bloom filters, you ask? A Bloom filter is a probabilistic set; that is it’s a set of items, but querying that set for the presence of some item x
returns either “x
is definitely not in this set” or “x
might be in this set”, but never “x
is definitely in this set”. The commit-graph
stores one of these Bloom filters for commits that reside in the commit-graph
, and it populates that Bloom filter with a list of paths changed between that commit and its first parent.
These Bloom filters are a huge boon for performance in lots of Git commands. The general pattern is something like: if you have a Git command that computes diffs (which can sometimes be proportionally expensive), then having Bloom filters allows Git to compute far fewer diffs by skipping the computation for certain commits when their Bloom filters return “definitely not” for paths of interest.
Take git log -- /path/to/file
, for example. Prior to Git 2.27, git log
would have to compute a diff over every revision in its walk before determining whether or not to show it (i.e., whether or not that diff has any entries for /path/to/file
). In Git 2.27 and newer, Git can skip computing many of those diffs altogether by consulting each commit C
‘s changed-path Bloom filter and querying it for /path/to/file
. Again: if querying returns “definitely not”, then Git knows that computing that diff is strictly uninteresting.
Because computing diffs between commits can be expensive (at least, relative to the complexity of the algorithm for which they are being generated), reducing the number of diffs computed overall can greatly improve performance.
To try this for yourself, you can run the command:
$ git commit-graph write --reachable --changed-paths
This generates a commit-graph
file with changed path Bloom filters enabled.[1] You should be able to see performance improvements in commands like git log -- <path>
, git log -L
, git blame
, and anything else that computes first-parent diffs against a given pathspec.
Tidbits
Now that we’ve talked about a few of the headlining changes from the past couple of releases, let’s look at a few more new features 🔎
- Have you ever been looking for the parts of history that changed some path? Maybe you just want to know about the commits that have modified some file, and that can be found easily enough by running
git log -- <path>
.Sometimes, you might be interested not only in which commits touched<path>
, but which merge commits brought those commits into the main line of developement. Have you ever found those merges difficult to find? You’re not alone. In most cases, Git will skip showing you those kind of merges withgit log -- <path>
, since those commits don’t modify the<path>
by themselves.Now you can bring those merges back into view with Git’s new--show-pulls
flag to revision walking commands, likegit log
andgit rev-list
. For a particularly informative view, try:$ git log --oneline --graph --show-pulls -- <path>
[source]
- When you run
git pull
in a repository when you’re tracking a remote branch, one of four things can happen: there might be no changes, changes on the server, client, or both. As long as there aren’t changes in both directions, resolving the difference is straightforward: when there are no changes at all, there’s nothing to do. When the server is strictly ahead of the client, the client fast-forwards to the state on the server.But, when there are change both on the client and on the server: what happens? That depends on whether not you have thepull.rebase
configuration set. If you do, your branch is rebased on top of where you’re pulling from, and otherwise, a merge is performed.These merges can clutter your history and be tricky to back out of without starting over your pull from scratch. Git 2.28 now warns you of this case (specifically, whenpull.rebase
is unset, and you didn’t explicitly specify--[no-]rebase
as an argument togit pull
).[source]
- Git now includes a GitHub Actions workflow which you can use to run Git’s own integration tests on a variety of platforms and compilers. There’s no extra effort required on your part: if you have a fork of
git/git
on GitHub, each push will be run through the array of tests necessary to validate your change. But wait: doesn’t Git use a mailing list for development? Yes, it does, but now you can use GitGitGadget on thegit/git
repository. This means that you can open a pull request, and have GitGitGadget send it to the mailing list on your behalf. So, if you’re more comfortable contributing to Git like that instead of composing emails manually, you can now contribute to Git from start to finish using GitHub.[source]
- On the other hand, if you don’t mind sending an email or two, it’s now much easier to interact with the Git mailing list when you encounter a bug by running
git bugreport
. Running this new command will open your$EDITOR
with a pre-populated form of questions that will be useful in debugging your issue. It also includes some helpful information about your system, like your CPU architecture, what version of Git you’re running, and so on.When you’re done, you can send that file as the body of an email to the Git mailing list, and rest assured that you’ve opened a helpful bug report.[source]
- We’ve talked a number of times about Git’s
clean
andsmudge
filters and the correspondingprocess
filter (which simulates multipleclean
andsmudge
filters in a single process). Up until recently, the protocol for these filters has been relatively straightforward: Git supplies one end of the content, and the filter produces the other.In Git 2.27, more information is supplied over the protocol, like metadata about the branch being checked out in the case ofgit checkout
, or the remote that was contacted in case of agit fetch
. This new information could be used in tools like, for eg., Git LFS in order to figure out which remote to contact for extra data.[source]
- Last but not least,
git status
learned some new tricks, too. You might recall from a recent blog post that we talked how sparse checkouts can shrink the size of your monorepo. Now,git status
can remind you of when you are in a sparse checkout by telling you what percentage of files you have checked out.For fans ofgit-prompt.sh
, the prompt will now displaySPARSE
if you are in a sparse checkout, too.[source]
The rest of the iceberg
That’s just a sample of changes from the latest couple of releases. For more, check out the release notes for 2.27 and 2.28, or any previous version in the Git repository.
[1]: Note that since Bloom filters are not persisted automatically (that is, you have to pass --changed-paths
explicitly on each subsequent write), it is a good idea to disable configuration that automatically generates commit-graph
s, like fetch.writeCommitGraph
and gc.writeCommitGraph
.
Tags:
Written by
Related posts
Securing Git: Addressing 5 new vulnerabilities
Git is releasing several new versions to address five CVEs. Upgrading to the latest Git version is essential to protect against these vulnerabilities.
Just launched: Second cohort of the DPG Open Source Community Manager Program!
Are you looking to have a positive impact in open source development? This program may be for you! Apply by May 30 to join.
Create a home for your community with GitHub Discussions
GitHub Community-in-a-box provides the tooling, resources, and knowledge you need to build internal communities of learning at scale with GitHub Discussions.