The problem
We use the trac TimingAndEstimationPlugin to record a lot of our time and git, of late, has been our preferred source control system. As we’ve been using more topic branches in git this poses a bit of a headache: patches can show up multiple times.
We’ve traditionally had the post-receive hook only select patches that update the master branch. This mostly works, but sometimes ticket comments and hours don’t appear for days/weeks until that branch gets merged back. What I really want is for a patch to be posted onwards only when it is new to this repo. As such we can develop in a topic branch, push that topic up to the central repo so it is backed up and hours/comments are recorded for the project manager to see, and then merge it into the mainline at some point later and have all of this work as expected.
The only thing this doesn’t support is rebase… but we all know that’s dangerous already, right? [1]
git rev-list to the rescue
During post receive we want to find commits that are now reachable from one of the refs, but was not previously reachable.
It turns out git-rev-list can give us exactly that, but crafting the call to it is a bit tricky. When you call git rev-list CommitA ^CommitB it will give you back the set difference, i.e. commits reachable from A that are not reachable from B. So we just need to ask git what’s reachable now, that wasn’t before this receive.
Algorithm
The post-receive runs after the database has been updated but you are given a log of what has been updated.
- Find the set, OriginalRefs, of refs reachable in the database now.
- foreach ref that was updated: remove it from OriginalRefs; exclude the old value and include the new value of that ref.
- exclude all refs left in OriginalRefs
There’s a bit of trickiness in watching for new or deleted branch names but that’s pretty much it. We feed these to rev-list over stdin and read the list of commits on stdout.
We implemented it in python and run this regularly. Speed is pretty good even for repositories that have several hundred named refs in them (mostly publish tags); a couple seconds of overhead perhaps[2], but compared to the ssh push it doesn’t feel like the process is slower.
[1] The next day after we deployed this we ran into a case of someone having the rebase on pull flag set that caused problems.
[2] This also includes time to load a trac environment and make a database call in there for every new commit we’ve found; this could be improved but good enough. I suspect the time in git rev-list to be even less.