This blog has moved to Medium

Subscribe via email

Posts tagged ‘Source Control’

When is git rebase better than git merge?

In Ken’s “git for gits” presentation today, I asked when is “git rebase” better than “git merge”.
I want to summarize what I think the best answer is in this post (hint: it’s not “the version tree looks simpler after a rebase”).
If I recall correctly most of the content of this post appears in some form in the Git Book, but there’s no harm in retelling it in my own words.

Suppose you’re working on a private feature branch, and master is receiving some commits as well.
If you try to merge master to your feature branch, you have to resolve all the conflicts at once. If you have a lot of them, this can be painful.

When you rebase, you are effectively reapplying your work, commit by commit, on top of the work done on master. At any point in this process, if there are conflict, you can resolve only that conflict, compile and test your work, and once you’re satisfied proceed to the next conflict. This can be indispensible when trying to do large merges (I’ve had merges that take a few hours to resolve and test … I still remember telling everyone “please don’t commit to trunk for the next few hours, I have a huge merge).

The downside is of course that you’re altering history, and if someone else cloned your repository this might mess up their work – so only do this on private branches.

A summary of my history with source control

  • Folder-based, copy-paste VCS. Where was that stable copy again? Ah damn, we deleted it.
  • SourceSafe. Will eat up your files™
  • ClearCase. How do I merge that branch? The ClearCase admin will be here in two weeks, he’ll help you. Hopefully.
  • TFS. Only works within Visual Studio? That’s ok, we’ll just use A SEPARATE VCS to store our java files.
  • SVN. Ah, you mean I don’t need to mark a file I’m editing, only when I actually commit it? Nice. But merges are still hell, and svn update takes forever.
  • git. You mean I can switch to another branch is about 20 milliseconds, and merge in back in a second? Sweet.

The End?

My walk through the Git book

I’ve been experimenting with git for about the last year, but most of the work I did with it so far was in the “single developer, hack some stuff, push to github” mode of operation, which is very superficial. Now that I’ll be working with it full time (git is one of the “semi wildly adopted” SCMs at Google), I thought it’s time to take a closer look at some wisdom accumulated by other folks, so I finally cracked open the Git book and did a pass over it.

The book is great and usually very fluid. It begins by show-casing the simple use cases you’ll encouter with git, and is filled with short code snippets you can try (even on a train with no WiFi – this is a distributed source control system after all). Some of the examples weren’t crystal clear straight out of the box, and relied on some previous knowledge the authors had (after all, much of the book was pulled together from different sources, so I imagine it was relatively easy to accidentally assume a bit of knowledge that its readers don’t necessarily have at that point).

Here is a summary of questions I had while reading the book, followed by some cool stuff I found at the end. I recommend at least some knowledge of git for the rest of this article, best accompanied with a reading of the Git book itself. As usual, if you find a mistake, please let me know. Some more related recommended reading is the Git for beginners SO question.

What happens on double git add?

git add is used not just to add new files, but also to ‘add’ changes in existing files.

When I do:

echo v1 > foo
git add foo
echo v2 > foo
git add foo
git commit -m bar

Are both versions of foo added to the commit log, or just the latest?

The answer is that just the latest version is actually committed.

After I git merge without conflicts, is a git commit needed?

Coming from svn it was my expectation that after I merge changes into my local branch, I will have to commit them. Doing a quick experiment showed that in git this is not the case at all – if a merge is resolved without manual intervention (including concurrent edits to different places of the same file), then no commit is needed. If there are any conflicts that are resolved manually (by git adding the file after fixing the merge), then a git commit is required.

How does gitk work? Sometimes I see branches, sometimes I don’t … it’s very confusing

This one has been puzzling me for quite a long time. I found that I couldn’t trust gitk, the graphical tool for visualizing commits, branches and merges, because it kept giving me inconsistent results, and for the life of me I couldn’t understand why.

Now I did a few experiments and digging, and found that by default gitk will only show you the current branch, and any objects that are its descendants in the version graph. If you create a branch, switch back to master, and ran gitk, you would not see this branch. What confused me is that upon refreshing, gitk rescans the current branch and add any new nodes to its display, while retaining anything alreaday shown – meaning if you run gitk, switch to a new branch, and refresh gitk, the new branch and its relation to the previous will now be displayed in gitk.

Of course, like all things linux, gitk can be controlled to behave like you want it. Just follow the gitk command with the names of the branches you want shown, or simply add “–all” to see all the branches in your repository.

How can you see the ‘branch structure’ of a repository?

In svn, there is a well defined directed graph between branches. When a branch is created of its parent, this parent-child relation is created and maintained, and the tools readily show you this branch graph.

I could have guessed this, but sources on Stack Overflow confirmed that there is no direct equivalent in git. Instead of branches having parent-child relations, there is a parent-child relation between objects, and so individual files and directories can have multiple parents in the version graph, where other files on the same branch might have completely linear histories. The model is more complex, but more powerful, and it seems to be the core reasons why merges in git are supposed to be easier than in svn.

What does ‘fast forward’ really mean?

Using git, I often saw messages with the words “fast forward”, but never really understood what it meant. This bit is explained rather nicely in the Git book – a fast forward happens when you merged branch b1 to b2, resolved any possible conflicts, and then merge the result back to b1. b2 already contains a version that is a descendant of the “heads” of both b1 and b2, meaning all the “merge work” was already done in it. So, when this structure is merged back to b1, what actually happens is all the revisions and merge work that happened on b2 is copied to b1. After this copying, the b1 branch (a pointer into the revision DAG) is “fast forwarded” to a descendant node that is the head of b2. In effect, the merge’s result becomes the head of b1 in a clean and simple manner.

This is radically different than svn – I still have horror flashbacks sometimes about trying to merge a branch back to trunk. I always first merged trunk to the branch, had to work my ass off to resolve all the conflicts and make the build green, and then sometimes had to do double the work when merging back to trunk. With git, you’re assured that the conflict resolution work you do on your branch is presereved and used to make merging back to master (the git equivalent of trunk) is as easy as cake.

git pull, fetch, and what’s in between

It is said that “git pull” is equivalent to “git fetch”, followed by “git merge”.
The ability to immediately fetch all the content of any remote repository without forcing you to merge it right now is great – you’re free to do the actual merge work and conflict resolution separately, and you only need connectivity to the remote repository for the fetch phase. When I tried this using two local folders, git merge complained, and I failed to understand what arguments I should pass to “git merge” in this case?

This turned out to be a simple technical issue. To merge the changes manually after fetching from an arbitrary remote, simply run git merge FETCH_HEAD (sometimes you just have to know the magic words). Normally, you would fetch from origin (usually the branch you cloned off), or another remotely tracked named branch, so you would just specify its name as the parameter to “git merge”.

How does pushing actually work?

Let’s say I setup a local “common” repo (it has to be bare for reasons explained in the Git book)

mkdir bare
cd bare
git init --bare
cd ..
git clone bare alice
cd alice
touch a && git add a && git commit -m "Added a"
git push # This fails

Why does the push fail?

It turns out that the problem was I tried to push to an empty repository. If I do “git push origin master”, then subsequent “git push” with no arguments succeed.

And now, for some cool stuff:

git bisect ftw

Suppose you just found a critical bug, and have no idea when it was introduced. You write a simple (manual/automated) test for it, and reproduce it, but you’re not sure what it causing it. git bisect to the rescue!

git bisect allows you to do a binary search on your repository to find the exact commit that introduced the bug. While this is possible with other VCSs, it is so natural in git that it’s beautiful. You simply do “git bisect start”, followed by “git bisect good” to indicate the current version works, and “git bisect bad” to indicate it doesn’t, and git will direct you towards the correct half of the version graph until you find the exact version when things turned bad.

Configure your defaults for fun and profit

Here are some tweaks I found in the book that you might want to do (if you have any other tweaks you’d like to recommend, please comment!)

oneline log messages

If, like me, you find the “one liner” log messages easier to read, you can make it the default with

git config –global format.pretty oneline

Life is colorful

Make git status and other messages much easier to read with

git config –global color.ui true

git-svn made easy

For the last year, my main usage of git was for my own personal projects – rather basic stuff, consisting of simple commit/push/pull operations. Recently, I wanted to edit some code on the OSQA project, which is unfortunately hosted on SVN. I am not a committer (yet), so if I wanted my work to be source control, I actually had no clear option except using git-svn.

It took me some time to get started, I find that there are still some gotchas that can surprise you if you’re new to git or git-svn. Luckily I stumbled across this lovely series of screencasts. Thomas walks you through the basics, and showcases some more advanced use cases as well. I highly recommend it! (I subscribed to his blog as well)

Dealing with Version Branches

At Delver, like many other places, we use version branches to maintain releases. We have one trunk where everything gets integrated, and when we want to stabalize a release version, we create a version branch and do a ‘code freeze’ on this branch. On the version branch, only bug fixes are committed, and no new features are developed.

This process helps us stabilize versions within a matter of days and proceed quickly from trunk to QA to Production.

A problem we experienced with this process was it was hard to make sure all bugfixes were properly merged to trunk. The commonplace practice is to merge all changes from the version branch to trunk at the end of the release cycle (when the version is “frozen”).

There are at least two problems with this approach:

  • First, it is usually one person who is left the ugly task of doing the merge of all these bugfixes that weren’t previously merged, usually in files he didn’t touch and knows nothing about.
  • Second, if there are bugfixes on the version branch after the version is frozen, what is to guarantee they will reach trunk? The sad answer  is “nothing” – we had several regression bugs because people forgot to merge their bugfixes).

Here is our NewAndImprovedProcess™ (as implemented by our very own Sergey Goncharov):

We setup a TeamCity build that monitors all version branches. It runs nightly, and examines the mergeinfo SVN property on all modified files. If it detects files that were committed to the version branch but weren’t merged to trunk, it fails and send an email to the responsible developer. A little convenience feature we sprinkled in was that if you have a change set that you really don’t want to merge to trunk, you can write in the commit note “[NO MERGE]” and the build will ignore this commit (you can also do a ‘recorded merge’, which is the proper SVN way of doing it, although  adding the commit note is faster in a quick-and-dirty way).