Git Subtree Survival Tips
Sometimes you want to share files across multiple Git repositories. In our case we wanted to share Jenkinfiles and Gradle scripts, which were maintained in a different repository. Our initial copy-paste methodology started to become burdensome after the first dozen projects. This was expected, of course. So we went looking for alternatives and (after rejecting Git submodules) started to experiment with Git subtrees. In this article I’ll share some tips to enhance your experience.
Git Subtree
Note This is not intended as a comprehensive introduction. For that I recommend this article on the Atlassian blog and the man page
What I like most about Git subtrees is that the “end user” (i.e. the person cloning the repository) doesn’t have to know about subtrees or do anything different. In that sense it’s more “idiot proof” than submodules.
The correct way can be summarized as follows:
# add a remote
git remote add some-remote --no-tags git@host:user/repo.git
# add squashed subtree
git subtree add some-remote some-tag-or-branch --prefix=shared --squash
# update subtree
git subtree pull some-remote some-tag-or-branch --prefix=shared --squash
But there are quite a few gotchas, which I’ll describe next.
Adding a subtree in an empty repository
You need to have at least one commit. I figure that’s because the subtree command creates a merge commit, but I haven’t asked. You can use this oneliner for free (in an initialized Git repository, otherwise run git init
first):
touch README.md && git add README.md && git commit -m "Add README"
Cloning a branch instead of a tag
Let’s continue with a more devious gotcha. If you clone a branch such as master
or develop
, it’s very hard to know which version of the files you’re using at any given time. You can do a git subtree pull some-remote master --prefix=shared --squash
once in a while, and sometimes it will say “Already up-to-date”. And sometimes it won’t.
git subtree pull some-remote master --prefix=shared --squash
git fetch some-remote master
warning: no common commits
remote: Counting objects: 727, done.
remote: Compressing objects: 100% (706/706), done.
remote: Total 727 (delta 398), reused 0 (delta 0)
Receiving objects: 100% (727/727), 149.99 KiB | 0 bytes/s, done.
Resolving deltas: 100% (398/398), done.
From host:user/repo
* branch master -> FETCH_HEAD
* [new branch] master -> some-remote/master
Added dir 'shared'
Have a look at your commit log to see what happened: a squashed commit and a merge commit is added to your tree. The squashed commit includes the hash, but it’s not terribly informative. Having these sprinkled on your history is not pretty.
git log --graph --oneline
* b7ecd2f Merge commit '61d6a7d61de73a29e15ed08acad0a8eb3364c042' as 'shared'
|\
| * 61d6a7d Squashed 'shared/' content from commit 64908a1
* d518e01 Initial commit
One solution is to simply tag releases and use those instead of branches. For example, if you want to use version 1.2.3 you would do this, and add a nice commit message for good measure:
git subtree add some-remote 1.2.3 --prefix=shared --squash \
-m "Merge version 1.2.3 of some-remote"
That way your commit log reveals that you’re on version 1.2.3 (excluding your own local changes).
git log --graph --oneline
* b7ecd2f Merge version 1.2.3 of some-remote
|\
| * 61d6a7d Squashed 'shared/' content from commit 64908a1
* d518e01 Some other commit
Forgetting to --squash
Squashing means that you get one commit instead of over 9000. This commit will get a new hash, of course, and it will only exist in your log. Ergo, not squashing means you get all commits in your log. What happens is that the entire commit history of the remote branch is fused to your tree. Perhaps some people like to see the commit history of the subtree, but we found it to be very confusing.
# Clone repository as subtree in /shared
git subtree add some-remote some-tag-or-branch --prefix=shared
# Check the log and place palm firmly against face
git log --oneline --graph
... endless list of commits drowning out your own work
What you should have done was simply this:
git subtree add some-remote some-tag-or-branch --prefix=shared --squash
How to fix it if you messed up
If you have no trailing commits it is easy to undo the last commit using git reset --hard HEAD~
and try adding the subtree again. If you’re several commits beyond that however, you need to identify the SHA1 hash of the merge-commit you want to remove, and run this: git rebase --onto SHA~ SHA
(replace SHA
by the correct SHA1 hash). Finally, you add the subtree again (with --squash
this time).
Fetching all remotes without specifying --no-tags
So you added a remote and run a git fetch --all
, only to realize that your pristine repository is now filled with tags. And since everybody uses SemVer tags, you can’t tell the difference between your 1.0.1 tag and the one from the remote. Even worse, because you didn’t notice it quickly enough you pushed all tags to your origin, and the other developers are now shouting at you.
The --no-tags
parameter informs the git fetch --all
command not to fetch this remote’s tags (unless you specify --tags
). Another way to prevent this is to remove the remote as soon as you’re done with it, and add it again when you need it.
git remote remove some-remote
How to fix it if you messed up
Warning This is destructive stuff. Consider backing up your repo.
You can list tags of a specific remote using git ls-remote --tags <very-taggy-remote>
:
git ls-remote --tags very-taggy-remote
44396b17a4b95d1b9ce7edea39659401bbfd4ac5 refs/tags/1.0.0
ebc085eead0294f0e17cf356e1afb8c00039bf46 refs/tags/1.0.1
...
This tells you which tags you may want to remove from your origin. Next, you can delete each offending remote tag using the ls-remote
output as a cheat-list. Good luck!
git tag -d 1.0.0 && git push origin :refs/tags/1.0.0
git tag -d 1.0.1 && git push origin :refs/tags/1.0.1
...
Pushing your subtree changes to the remote
Say you’re on a feature branch, furiously hacking on the shared code in an attempt to get your application to build. Then for some strange reason you decide to commit and git subtree push --prefix=shared some-remote develop
.
While this is possible, I think this practice should be actively discouraged. With threats of bodily harm if necessary.
How to fix it if you messed up
This is usually a matter of rewriting history. If you don’t know how to do that you probably shouldn’t be doing it :-)
Conclusion
Git subtree is a wonderful alternative to Git submodules, but it’s fairly easy to mess things up. You can make life a lot easier for yourself and other committers if you add a bash script like the following to your repository, and just avoid the git subtree
command altogether:
update.sh
Usage: ./update.sh [<branch or tag (defaults to 'master')>]
#!/bin/bash
REF=${1-master} # branch or tag; defaults to 'master' if parameter 1 not present
REMOTE=some-remote # just a name to identify the remote
REPO=git@host:user/repo.git # replace this with your repository URL
FOLDER=shared # where to mount the subtree
git remote add $REMOTE --no-tags $REPO
if [[ -d shared ]]; then # update the existing subtree
git subtree pull $REMOTE $REF --prefix=$FOLDER --squash -m "Merging '$REF' into '$FOLDER'"
else # add the subtree
git subtree add $REMOTE $REF --prefix=$FOLDER --squash -m "Merging '$REF' into '$FOLDER'"
fi
git remote remove $REMOTE