This blog has moved to Medium

Subscribe via email


Archive for December 2008

My girlfriend can clone DNA

Cool title, huh?

Aya just started working at a lab that does genetic engineering. She’s barely there a week, and already she’s cloned DNA. Here is a HUGE simplification of how it works (as much as my programmer’s brain managed to understand, with tons of errors I’m sure):

  1. Get a sample of DNA, this will be your template
  2. Prepare a solution with:
    1. Nuclease-free water – you want to make sure the water doesn’t contain any other DNA, or it might be cloned instead of the template
    2. Buffer solution – help create optimal conditions for the enzyme (more on this later)
  3. Add two specific primers, each is a completion of some known segment of the template. The size of each primer is usually about 24 bases.
  4. Add lots of DNTPs – these are the DNA monomers, the single building blocks of DNA
  5. Add a heat-stable enzyme such as Taq polymerase – this is the engine behind the entire reaction. It will latch on to the primers and run along the templates, adding DNTPs and building the DNA molecule.
  6. Put it all in a PCR machine
  7. Repeat about 30 iterations:
    1. Heat to 94-99 °C to make the DNA strands disconnect
    2. Cool down to 50-65 °C, so the primers can attach to the DNA strands
    3. Heat to 75-80 °C, which is the the optimum temperature for Taq polymerase
    4. The Taq polymerase finds the primers and then runs along the disjoint strands, collecting nucleotides from the solution and building the complementary strands

This process theoretically clones a segment of a single DNA molecule to about 230 identical molecules. Because the Taq polymerase sometimes fall off while building the strands, after the cloning she measures the length of the DNA molecules using Agarose gel electrophoresis. On that, another time perhaps.

What to do about nondeterministic tests?

We’ve all had these. Annoying tests, that work perfectly 85% of the time, but fail mysteriously when the moon is half full and someone is using decaff. We (myself most certainly included) usually tend to ignore it, rerun the test and pray the problem will just go away.

This is simply not the way! It’s an acceptable solution for tests that work 99% of the times, but as soon as a test starts failing sporadically, you’d better do one of these:

1. Analyze the test, understand the source of the indeterminacy, and make it deterministic! This is not always practical because it’s usually the most time-consuming solution.
2. If the test is really fast, you can consider rerunning it automatically on failure / introducing more sleep() to eliminate fuzinness. Not a real solution, but will get the test green some times.
3. If all else fails, just comment out the problematic part of the test or even the entire test. This is not what you really want, but it’s better than just leaving the test failing sporadically.

If you do nothing, you’ll quickly experience CI degradation – your test suite will become meaningless. People will no longer care about it, not even enough to fix tests that are easy – because “The CI is broken anyway”. An all green CI is a wonderful productivity tool, and it is reachable and worth the ROI.

Not completely smooth upgrade to Visual Studio 2008 + TFS

Major Reversal – I take this post back. I’m experiencing some difficulties with VS2008, especially when debugging. The IDE gets stuck sometimes, and the debugger jumps into the code of a heavy ToString() of one of our objects and when I try to resume it dies. I definitely didn’t experience this with 2005. When is SP2 due?

I know when I’m thinking of upgrading a heavy software product, I need positive reviews from friends to ensure me the risk is low.

We’ve just moved to both Team Foundation Server and Visual Studio 2008 from 2005, and the move went rather well. First, Oren and Tomer upgraded the TFS version some evening. As far as I remember we had zero issues, and immediately felt the impact of faster (and some say less painful) merges.

Then, we’ve had everyone install Visual Studio 2008, and Oren migrated all the solutions and projects (over all our branches) to 2008 (the big issue was that the file format of 2008 is not backward compatible).

Some of our team members have Resharper 4 which supports C# 3 syntax, while others still have Resharper 3. This was a potential danger, so we decided to disallow C# 3 features for now. I wrote a small utility to disable C# 3 syntax in Resharper 4 (run once).

It’s hard to enumerate the benefits of VS2008. What I see immediately are that it’s a lot more stable, and has a bit friendlier UI (Little things like being able to open folders directly from Source Control Explorer). When we do get to C# 3 and .NET 3.5, we’ll be able to write nicer code and use some additions to the BCL (like a new HashSet class to replace our existing C5.HashSet – this one actually implement System.Collections.Generic.ICollection !).

To summarize – move to TFS/VS 2008 when you get the chance (if you haven’t done so already).

P.S. – A huge benefit of VS2005 is that it usually knows to compile only projects your main assembly depends on. This is a sweet time saver.

The shorter the better

I wrote in a previous post about a few refactorings meant to eliminate code duplication. I was reminded recently this principle has a name – DRY, which stands for Don’t Repeat Yourself, and should be applied everywhere.

Eliminating duplication, while a noble task, is not the only refactoring one should practice and apply. Breaking up large pieces of code into exceedingly smaller pieces is also important. It makes your code more readable to yourself and to other developers, and also make merges much easier to accomplish – nothing is more terrible than looking at a huge merge of a huge method or class and having to guess what combination of the versions is the correct one.

Here are a couple of important tips/techniques to make your code more manageable:

Tip I – Group expression into methods

It doesn’t matter if it’s code you wrote or stumbled on. Whenever you get the chance, select a few statements in a long method, use Resharper’s Extract Method refactoring, invent some name to describe what this group of statements does, and voila – you’ve shortened the original method. It’s easier to understand and maintain, and now the new method can be called and tested on its own. This technique can turn huge 200-line monster methods into responsible, comprehensible 15-liners.

Ideally you’d want to use this refactoring on a bunch of statements that actually have a logical cohesive meaning together. Even if that’s not the case, usually you’ll be better off with the extracted method. One notable exception – when the set of statements you’d refactor will lead to a method will multiple out parameters. Then you’d still have to declare these variables in the calling code, and your gain greatly diminishes.

Tip II – Group methods into classes

You’ve all seen the huge classes with dozens of methods. Hell, Tip 1 above is all about creating even more methods! Well, once you feel a class has grown too much, you should try to spot groups of methods that have related functionality. Perhaps most methods in the group are used only by a single method, but nowhere else. In this case, if you move all these methods away to a new class and make them private members of that class, their absense will certainly not be missed in the original class – it only uses the one method anyway (be sure to keep it public, of course). Like in the previous tip, you want to try and find groups of methods that have a common logical function in order to use this refactoring. However, even if you can’t find a common function to these methods, they will still be better off in another class.

While Resharper does have a Move Method refactoring, in some cases it’s better to just cut and paste, especially when you move a bunch of methods together, with the data members they use and all.

Tip III – Reduce nesting level

This will not shrink your methods by a lot, but it will make them more managable. Nesting level implies context, which you have to maintain in your head when you are processing code. When is this code called? Only if 5 different if statements happen to succeed. It’s painful on the eye.

A useful refactoring for this is Resharper’s “Invert If”. It takes a convoluted piece of code such as

public void ProcessUserInput(string name)
{
   if (name != null)
   {
      int id = FindIdByName(name);
      if (id != 0)
      {
         if (TryToStoreName(name, id))
         {
            Console.WriteLine("Stored {0}, {1}", name, id);
         }
      }
   }
}

And beautifies it into

public void ProcessUserInput(string name)
{
   if (name == null) return;
 
   int id = FindIdByName(name);
   if (id == 0) return;
 
   if (TryToStoreName(name, id))
   {
      Console.WriteLine("Stored {0}, {1}", name, id);
   }
}

Much easier to follow (the difference is highly evident in methods with 5 nesting levels or more – these acutely need this refactoring).

Disable Resharper C# 3 syntax

I love Resharper, and I love C# 3.0, but sometimes they can’t play together.
At Delver we still haven’t purchased enough R# 4 licenses, so until we do, won’t use C# 3 features such as lambdas. This makes working with R# 4 annoying, because every file you open is filled with suggestions and warnings for which you just can’t do anything, because they’re all in C# 3. Another dangerous thing is this – C# 3 has gotten better at deducing generic arguments, so R# 4 will tell you to remove the arguments when not needed, thus bringing compilation errors to people with earlier versions.

The solution: disabling C#3 for Resharper. This can be done for every project – select the project and hit F4, and change Resharper’s “Language Level” setting.

Here is a small piece of code that does this for all your projects, over all yours branches (we have ~70 projects X ~5 active branches).

Warning: This code assumes for simplicity that the per project resharper options file doesn’t contain anything interesting (it’s overwritten from scratch). In the current version, this appears to be true. Also, the solution must be closed before running the exe – otherwise the “.reshaper” files will be deleted when you close it.

using System;
using System.IO;
 
namespace Unsharper
{
    class Program
    {
        static void Main(string[] args)
        {
            if (args.Length != 1)
            {
                Usage();
                return;
            }
            string arg = args[0];
            int ver;
            if (!int.TryParse(arg, out ver))
            {
                Usage();
                return;
            }
            if (ver != 2 && ver != 3)
            {
                Usage();
                return;
            }
 
            string config = "<Configuration>n<CSharpLanguageLevel>CSharp" + 
                ver + "0</CSharpLanguageLevel>n</Configuration>";
 
            Console.WriteLine("Finding resharper setting files");
            foreach (string csproj in Directory.GetFiles(".", "*.csproj", SearchOption.AllDirectories))
            {
                string resharperFile = csproj + ".resharper";
                Console.WriteLine("Writing " + resharperFile);
                File.WriteAllText(resharperFile, config);
            }
        }
 
        private static void Usage()
        {
            Console.WriteLine("Usage: unsharper {2/3} - make Resharper use C# 2 or C# 3 syntax");
            Console.WriteLine("This runs over all projects in your current folder");
        }
    }
}

Delving Blogs

Check out my post in Delver’s blog. The feature has only been in production for a few days and already I see cool search results from blogs of friends. Let me know if you search and find something useful with it.

Enough with ExpertExchange already!

I thought Google Searchwiki will help me stop seeing this crappy website on my search results, but for some weeks now I don’t see SearchWiki anymore. Anyone knows what happened to it?

Hello Worldpress 2.7

I was waiting for such a post to upgrade. Took 6 minutes, including this post, using WordPress Auto-Upgrade plugin. Still need to explore the new version though. And I still need to tweak Simple Tags plugins.

Update – Things I like:

  1. I finally found where to manage spam comments. I accidentally marked a bunch of legitimate comments as spam the other week, and for the life of me, just couldn’t find how to undo it. In 2.7, you can access all comments from the dashboard.
  2. Finally, built in Ajax