Ignoring bulk change commits with git blame

17‑10‑2019 Arnout Boks 6 min.

A long-standing objection to making bulk changes to code using automated tools (e.g. to conform to a given code style) is that it clutters the output of git blame. With git 2.23, this does not have to be the case anymore! In this post I will start by explaining the value of git blame and how commits with style changes in bulk can be problematic. If you already understand this problem and just want a solution, you can directly skip to the new features git 2.23 has to offer.

Putting changes into context

A characteristic feature of legacy code is that it's often not clear why it operates the way that it does. Some of the original developers may have left or have been reassigned to another project, documentation is virtually nonexistent, and the few remaining developers do not remember all the details anymore. For example, one day you might stumble upon the follow piece of code:

<?php
function describeBottles(int $amount = 42): string {
    return 'There are ' . $amount . ' bottles of cider on the wall.';
}

Despite being an artificial example, this code already raises some questions. Why is the default amount of bottles being described 42? And why do we describe bottles of cider? Bottles of beer would be a more customary alternative, right? Still these choices were probably made for a good reason; it's just that we don't know that reason.

It would be good if the reasoning behind these choices was documented using comments. However, as happens with legacy code, this is not the case. How can we still find out the motivation behind the current state of the code? A version control system such as git (you use version control, right?) may be helpful here. If you write good commit messages that focus on the why rather than the how, you might be able to distill the context from there. We only need to find which commit made a given change.

The git blame command (or git praise if you prefer a more positive mindset) can be helpful here. It shows, for each line in a file, which commit made the last change to that line, along with its timestamp and author:

$ git blame describeBottles.php
^8206b47 (Jane Doe   2019-04-21 09:41:20 +0200 1) <?php
b589bf1e (John Smith 2019-07-03 14:42:46 +0200 2) function describeBottles(int $amount = 42): string {
2c386e07 (A.N. Other 2019-09-18 16:58:24 +0200 3)     return 'There are ' . $amount . ' bottles of cider on the wall.';
^8206b47 (Jane Doe   2019-04-21 09:41:20 +0200 4) }

From this output we can see that line 3 was last changed by 'A.N. Other' in commit 2c386e07. If we lookup the details for that commit we may find out why this function describes bottles of cider rather than beer:

$ git show 2c386e07
commit 2c386e07b72041af1e0c2f827ac31357829429dd
Author: A.N. Other <a.n.other@example.com>
Date:   Wed Sep 18 16:58:24 2019 +0200

    Change drink
    
    Extensive user testing has shown that our customers like
    cider better than beer.

    Jira: BOT-123

diff --git a/describeBottles.php b/describeBottles.php
index ef2b0fd..9336895 100644
--- a/describeBottles.php
+++ b/describeBottles.php
@@ -1,5 +1,5 @@
 <?php
 function describeBottles(int $amount = 42): string {
-    return 'There are ' . $amount . ' bottles of beer on the wall.';
+    return 'There are ' . $amount . ' bottles of cider on the wall.';
 } 

Bingo! We have found the exact commit in which we swapped beer for cider, and more importantly: we know why. We even have a link to a Jira ticket where we can find more information. Perhaps it contains the full user testing results, providing us with even more context. This makes git blame an absolute life saver in legacy projects.

The problem: bulk changes

The team behind the describeBottles-function has always used their own coding standards, with opening braces on the same line and 'CRLF' line endings. One day they decide to adopt the PSR-2 coding style guide that has become popular in the PHP community. Luckily there are tools like PHP-CS-Fixer and phpcbf to automatically convert the whole codebase to the new standard. There are similar tools for almost all other programming languages.

Now the team has one huge commit with style changes in their repository. It touches every line without altering the meaning or intent of the code. If we would now use git blame to find the background for a line of code, the output would be:

$ git blame describeBottles.php
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 1) <?php
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 2) function describeBottles(int $amount = 42): string
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 3) {
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 4)     return 'There are ' . $amount . ' bottles of cider on the wall.';
df0ee6b0 (Regina Phalange 2019-09-26 16:51:58 +0200 5) }

As we can guess, the last commit that touched line 4 does not give us any useful context anymore:

$ git show df0ee6b0
commit df0ee6b006ee0f90cccc18b71ced290f6cae18d9 (HEAD -> master)
Author: Regina Phalange <r.phalange@example.com>
Date:   Thu Sep 26 16:51:58 2019 +0200

    Fix line endings

diff --git a/describeBottles.php b/describeBottles.php
index 17f0657..d9c9f99 100644
--- a/describeBottles.php
+++ b/describeBottles.php
@@ -1,5 +1,5 @@
-<?php
-function describeBottles(int $amount = 42): string
-{
-    return 'There are ' . $amount . ' bottles of cider on the wall.';
-}
+<?php
+function describeBottles(int $amount = 42): string
+{
+    return 'There are ' . $amount . ' bottles of cider on the wall.';
+}

Because these bulk changes render git blame useless, many teams refrain from applying automated style changes of this magnitude. That means they have to live with either a coding standard that they would rather not have, or with a codebase that does not follow their standards.

Git 2.23 to the rescue!

To limit the impact of such 'unimportant' bulk commits, git 2.23 adds a new option to git blame. Using --ignore-rev, one can specify a commit to be ignored by git blame. Lines changed by the ignored commit will be attributed to the previous commit touching that line instead. This means that even after our bulk style change, we can get back a meaningful context for the 'real' changes to our function:

$ git blame --ignore-rev df0ee6b0 describeBottles.php
^8206b47 (Jane Doe   2019-04-21 09:41:20 +0200 1) <?php
b589bf1e (John Smith 2019-07-03 14:42:46 +0200 2) function describeBottles(int $amount = 42): string
b589bf1e (John Smith 2019-07-03 14:42:46 +0200 3) {
2c386e07 (A.N. Other 2019-09-18 16:58:24 +0200 4)     return 'There are ' . $amount . ' bottles of cider on the wall.';
^8206b47 (Jane Doe   2019-04-21 09:41:20 +0200 5) }

Note how even line 3, which was added by the ignored commit, is attributed to commit b589bf1e, which originally added the brace on the line above.

When multiple bulk commits were added over time, it takes quite some effort to add a --ignore-rev for each of them in order to get a 'clean' output for git blame. Luckily, git also provides a way to make this easier on us. In your repository, create a file to hold commit hashes of commits to be ignored by git blame. Naming this file .git-blame-ignore-revs seems to be a common convention.

$ cat .git-blame-ignore-revs 
# Conversion to PSR-2 code style
237de8a6367a88649a3f161112492d0d70d83707

# Fix line endings
df0ee6b006ee0f90cccc18b71ced290f6cae18d9

The file should contain the full (40 char) commit hashes. Lines starting with a # are considered comments and can be used to explain what makes the given commit(s) unimportant. Now we can call git blame with the --ignore-revs-file option to ignore all these commits at once:

$ git blame --ignore-revs-file .git-blame-ignore-revs describeBottles.php

The .git-blame-ignore-revs can be versioned inside the repository, so that all developers can use (and maintain) the same list of ignored commits. To avoid typing the extra option with every command, we can set the blame.ignoreRevsFile configuration variable:

$ git config blame.ignoreRevsFile .git-blame-ignore-revs

This causes git to automatically ignore the commits specified in that file for every call to git blame. If you stick to the .git-blame-ignore-revs naming convention you can even set this configuration variable globally, so that it applies to all your repositories, each with their own .git-blame-ignore-revs file. Be aware however that git currently gives an error when this setting is configured globally but a repository has no .git-blame-ignore-revs file yet. I hope that this is considered a bug and will be fixed in an upcoming version.

Another limitation to be aware of is that platforms like GitHub and GitLab do not yet support files with commits to ignore for the 'blame'-button in their user interface. It would be awesome if they added such a feature soon.

One last thing: be aware that you need at least version 2.23 of git to use these new features. On the git downloads page you can find out how to obtain the latest git for your platform. But even if you cannot upgrade yet for some reason, you can already start building a .git-blame-ignore-revs file with commits you would like to hide from git blame. That way you can hit the ground running when it's time to upgrade.

Summary

Git 2.23 contains an absolute game changer that is not even mentioned in the release highlights. Fear of polluting the git blame output no longer has to be a blocker for applying style changes in bulk: these commits can now be ignored. You can even share a list of ignored commits with your entire team. So go ahead and switch over to that new coding standard; git won't hold you back anymore.

git version control code style PHP_CodeSniffer legacy code EN

Deel deze blog