This blog has moved to Medium

Subscribe via email


Posts tagged ‘git’

Three types of git reverts

A brief summary of a (brief) talk I gave in the recent JJTV Tools Night:

In git, there are (at least) three types of reverts:

1. Casual Revert

You committed a bug, did a few commits on top on of that, and then found out the bug. To fix it, you simply

git revert <Bug's SHA1>

This creates a new commit with the reverse of the faulty commit. If needed (in case of changes in the same area), the new commit is merged with any conflicting changes.

After resolving any merge conflicts, you push out the new commit – problem solved.

2. “Secret Info” Revert

http://stackoverflow.com/questions/1270514/undoing-a-git-push

You accidentally committed & pushed something into the source repository that you never want anyone to see. This might be you cursing the boss, or, more likely, passwords to production servers that should never be in the source control (because it’s viewable by more people than you’re comfortable with).

The fix this time is rewriting history. In the simple case where your accidental commit is the last one, you simply

git reset --hard <SHA1 before the faulty commit>

Then, you’re in a special state – if you git pull, you’ll get the changes back from origin. The solution is to do

git push -f origin master:master

forcing your HEAD to overwrite origin’s master. Then, when people pull your changes, their own HEADs will be overwritten to the latest changes from master.

The case where the faulty commit is not the branch’s HEAD but rather stuck in the middle can be solved using git rebase, and is left as an exercise to the reader.

3. Merge Revert

When you try to apply git revert on a commit that happens to be a merge point, you’ll get an error – git doesn’t have a single changeset/diff that leads to this commit, but rather two or more changes.

When reverting a merge, you will have to use the -m switch and manually choose which parent is the “correct” one to keep, and which parents are discarded. Note that sometimes the easiest way to revert a faulty merge, especially if it hasn’t been pushed, is to simply git reset --hard to the correct point, and reattempt the merge. When using git revert to revert a merge commit, the merge will still have happened, and any future invocations of git merge will only bring newer changes into your branch, but will not reattempt the merge.

How to do a production hotfix

Situation
It’s Thursday/Friday evening, the daily version / master branch was deemed too risky to install, and you decide to wait for Sunday/Monday with the deploy to production.

There’s a new critical bug found in production.
We do not want to install the bug on top of all the other changes, because of the risk factor.

What do we do?
Develop the fix on top of the production branch, in our local machine, git push, and deploy the fix, without all the other changes.

How can I do this?

My example uses a Play Framework service, but that’s immaterial.

  1. gitk –all – review the situation
  2. Suppose the latest version deployed in prod is 1.2.3, and master has some commits after that.
  3. You checkout this version:

    git checkout 1.2.3

  4. Create a new branch for this hotfix.

    git checkout -b 1.2.3_hotfix1

  5. Fix the bug locally, and commit.
  6. Test it locally.
  7. git push
  8. On the production machine:
    1. git fetch (not pull!)
    2. sudo service play stop
    3. git checkout 1.2.3_hotfix1
    4. sudo service play start
  9. Test on production
  10. Merge the fix back to master:
    1. git checkout master
    2. git merge 1.2.3_hotfix1
    3. git push
  11. Clean up the local branch:

    git branch -d 1.2.3_hotfix1

    (Note: the branch will still be saved on origin, you’re not losing any information by deleting it locally)

When is git rebase better than git merge?

In Ken’s “git for gits” presentation today, I asked when is “git rebase” better than “git merge”.
I want to summarize what I think the best answer is in this post (hint: it’s not “the version tree looks simpler after a rebase”).
If I recall correctly most of the content of this post appears in some form in the Git Book, but there’s no harm in retelling it in my own words.

Suppose you’re working on a private feature branch, and master is receiving some commits as well.
If you try to merge master to your feature branch, you have to resolve all the conflicts at once. If you have a lot of them, this can be painful.

When you rebase, you are effectively reapplying your work, commit by commit, on top of the work done on master. At any point in this process, if there are conflict, you can resolve only that conflict, compile and test your work, and once you’re satisfied proceed to the next conflict. This can be indispensible when trying to do large merges (I’ve had merges that take a few hours to resolve and test … I still remember telling everyone “please don’t commit to trunk for the next few hours, I have a huge merge).

The downside is of course that you’re altering history, and if someone else cloned your repository this might mess up their work – so only do this on private branches.

My walk through the Git book

I’ve been experimenting with git for about the last year, but most of the work I did with it so far was in the “single developer, hack some stuff, push to github” mode of operation, which is very superficial. Now that I’ll be working with it full time (git is one of the “semi wildly adopted” SCMs at Google), I thought it’s time to take a closer look at some wisdom accumulated by other folks, so I finally cracked open the Git book and did a pass over it.

The book is great and usually very fluid. It begins by show-casing the simple use cases you’ll encouter with git, and is filled with short code snippets you can try (even on a train with no WiFi – this is a distributed source control system after all). Some of the examples weren’t crystal clear straight out of the box, and relied on some previous knowledge the authors had (after all, much of the book was pulled together from different sources, so I imagine it was relatively easy to accidentally assume a bit of knowledge that its readers don’t necessarily have at that point).

Here is a summary of questions I had while reading the book, followed by some cool stuff I found at the end. I recommend at least some knowledge of git for the rest of this article, best accompanied with a reading of the Git book itself. As usual, if you find a mistake, please let me know. Some more related recommended reading is the Git for beginners SO question.

What happens on double git add?

git add is used not just to add new files, but also to ‘add’ changes in existing files.

When I do:

echo v1 > foo
git add foo
echo v2 > foo
git add foo
git commit -m bar

Are both versions of foo added to the commit log, or just the latest?

The answer is that just the latest version is actually committed.

After I git merge without conflicts, is a git commit needed?

Coming from svn it was my expectation that after I merge changes into my local branch, I will have to commit them. Doing a quick experiment showed that in git this is not the case at all – if a merge is resolved without manual intervention (including concurrent edits to different places of the same file), then no commit is needed. If there are any conflicts that are resolved manually (by git adding the file after fixing the merge), then a git commit is required.

How does gitk work? Sometimes I see branches, sometimes I don’t … it’s very confusing

This one has been puzzling me for quite a long time. I found that I couldn’t trust gitk, the graphical tool for visualizing commits, branches and merges, because it kept giving me inconsistent results, and for the life of me I couldn’t understand why.

Now I did a few experiments and digging, and found that by default gitk will only show you the current branch, and any objects that are its descendants in the version graph. If you create a branch, switch back to master, and ran gitk, you would not see this branch. What confused me is that upon refreshing, gitk rescans the current branch and add any new nodes to its display, while retaining anything alreaday shown – meaning if you run gitk, switch to a new branch, and refresh gitk, the new branch and its relation to the previous will now be displayed in gitk.

Of course, like all things linux, gitk can be controlled to behave like you want it. Just follow the gitk command with the names of the branches you want shown, or simply add “–all” to see all the branches in your repository.

How can you see the ‘branch structure’ of a repository?

In svn, there is a well defined directed graph between branches. When a branch is created of its parent, this parent-child relation is created and maintained, and the tools readily show you this branch graph.

I could have guessed this, but sources on Stack Overflow confirmed that there is no direct equivalent in git. Instead of branches having parent-child relations, there is a parent-child relation between objects, and so individual files and directories can have multiple parents in the version graph, where other files on the same branch might have completely linear histories. The model is more complex, but more powerful, and it seems to be the core reasons why merges in git are supposed to be easier than in svn.

What does ‘fast forward’ really mean?

Using git, I often saw messages with the words “fast forward”, but never really understood what it meant. This bit is explained rather nicely in the Git book – a fast forward happens when you merged branch b1 to b2, resolved any possible conflicts, and then merge the result back to b1. b2 already contains a version that is a descendant of the “heads” of both b1 and b2, meaning all the “merge work” was already done in it. So, when this structure is merged back to b1, what actually happens is all the revisions and merge work that happened on b2 is copied to b1. After this copying, the b1 branch (a pointer into the revision DAG) is “fast forwarded” to a descendant node that is the head of b2. In effect, the merge’s result becomes the head of b1 in a clean and simple manner.

This is radically different than svn – I still have horror flashbacks sometimes about trying to merge a branch back to trunk. I always first merged trunk to the branch, had to work my ass off to resolve all the conflicts and make the build green, and then sometimes had to do double the work when merging back to trunk. With git, you’re assured that the conflict resolution work you do on your branch is presereved and used to make merging back to master (the git equivalent of trunk) is as easy as cake.

git pull, fetch, and what’s in between

It is said that “git pull” is equivalent to “git fetch”, followed by “git merge”.
The ability to immediately fetch all the content of any remote repository without forcing you to merge it right now is great – you’re free to do the actual merge work and conflict resolution separately, and you only need connectivity to the remote repository for the fetch phase. When I tried this using two local folders, git merge complained, and I failed to understand what arguments I should pass to “git merge” in this case?

This turned out to be a simple technical issue. To merge the changes manually after fetching from an arbitrary remote, simply run git merge FETCH_HEAD (sometimes you just have to know the magic words). Normally, you would fetch from origin (usually the branch you cloned off), or another remotely tracked named branch, so you would just specify its name as the parameter to “git merge”.

How does pushing actually work?

Let’s say I setup a local “common” repo (it has to be bare for reasons explained in the Git book)

mkdir bare
cd bare
git init --bare
cd ..
git clone bare alice
cd alice
touch a && git add a && git commit -m "Added a"
git push # This fails


Why does the push fail?

It turns out that the problem was I tried to push to an empty repository. If I do “git push origin master”, then subsequent “git push” with no arguments succeed.

And now, for some cool stuff:

git bisect ftw

Suppose you just found a critical bug, and have no idea when it was introduced. You write a simple (manual/automated) test for it, and reproduce it, but you’re not sure what it causing it. git bisect to the rescue!

git bisect allows you to do a binary search on your repository to find the exact commit that introduced the bug. While this is possible with other VCSs, it is so natural in git that it’s beautiful. You simply do “git bisect start”, followed by “git bisect good” to indicate the current version works, and “git bisect bad” to indicate it doesn’t, and git will direct you towards the correct half of the version graph until you find the exact version when things turned bad.

Configure your defaults for fun and profit

Here are some tweaks I found in the book that you might want to do (if you have any other tweaks you’d like to recommend, please comment!)

oneline log messages

If, like me, you find the “one liner” log messages easier to read, you can make it the default with

git config –global format.pretty oneline

Life is colorful

Make git status and other messages much easier to read with

git config –global color.ui true

git-svn made easy

For the last year, my main usage of git was for my own personal projects – rather basic stuff, consisting of simple commit/push/pull operations. Recently, I wanted to edit some code on the OSQA project, which is unfortunately hosted on SVN. I am not a committer (yet), so if I wanted my work to be source control, I actually had no clear option except using git-svn.

It took me some time to get started, I find that there are still some gotchas that can surprise you if you’re new to git or git-svn. Luckily I stumbled across this lovely series of screencasts. Thomas walks you through the basics, and showcases some more advanced use cases as well. I highly recommend it! (I subscribed to his blog as well)