This blog has moved to Medium

Subscribe via email


Posts tagged ‘C#’

Playing with a few ReaderWriterLocks in .Net

With Delver‘s future clouded by a big question mark, I’m looking for a new job. This reminded me of my previous job hunt, and the part we all “love” about it – job interviews.

Last year, two companies I was interviewing for asked me to implement a ReaderWriterLock. It took me more time than I’d hoped, but I got the implementation done. Since this was one of the hardest interview questions (for me at least), I’ve decided to implement one or two versions, and to test the performance vs the built-in reader writer locks available in .NET.

First, I wrote the test code, which is roughly composed of:

  1. Shared (singleton) counters object with two counters, X & Y.
  2. Some reader threads. When a reader reads, it reads both X & Y and makes sure they are equal.
  3. Some writer threads, that increment X & Y together (thus maintaining X == Y).

Then, I tested this setup without any locks, and as expected the reader threads got different values of X & Y.

Next, I implemented the following locks:

  1. DummyReaderWriterLock – a horribly inefficient implementation that is reader/writer-oblivious. It just locks the object regardless of reads/writes.
  2. SemaphoreReaderWriterLock – already a huge improvment, this locks uses a semphore that holds some large number (~2000) locks. Each reader requests only a single lock from the semaphore, and a writer must obtain ALL locks before continuing. Two writers are prevented from deadlocking by a separate Mutex. One immediate problem with this lock is writer starvation. For a writer to obtain the lock, it must first get the mutex and then all locks in the semaphore. This means a single writer is competing against all readers for the semaphore.
  3. EventReaderWriterLock – implemented by two events and a reader count. The first reader & all writers must get signaled in order to get the lock. Once any reader got the lock, other readers are free to enter without blocking. The last reader out is responsible for signaling the event and letting other writers or readers back in.

I also pulled my team leader Oren into this problem, and he came up with a state based implementation – not that far from my own, with an additional “state” integer that represents whether the lock is in “writing mode”, “reading mode” or free. He also added a few TestAndCompare hacks for checking the state.

Finally, I tested the performance of these locks vs two locks that are availble in .NET 3.5: ReaderWriterLock and ReaderWriterLockSlim. I only tested only one scenario, but was pleasantly surprised to discover the performance of both my EventReaderWriterLock and Oren’s StateReaderWriterLock were identical to that of ReaderWriterLockSlim, and better than ReaderWriterLock! (The performance of DummyReaderWriterLock were literally off the charts).

Here is my implementation:

    /// 
    /// A ReaderWriter lock implemented by events.
    ///
 
    /// An AutoReset event that gives ownership of the lock either to one writer or to all the readers.
    /// A manual reset event that allows readers to enter, and is only reset when the last reader finishes
    /// 
    /// 
    public class EventReaderWriterLock : IDisposable
    {
        /// 
        /// Both readers and writer need this lock to work
        /// 
        private readonly AutoResetEvent _lockAvailble = new AutoResetEvent(true);
 
        /// 
        /// Further readers beyond the first one wait on this object
        /// 
        private readonly ManualResetEvent _canReadEvent = new ManualResetEvent(false);
 
        /// 
        /// Used to synch the reader lock/release
        /// 
        private readonly object _readerLock = new object();
 
        private int _readers;
 
        public void LockReader()
        {
            lock (_readerLock)
            {
                int oldReaders = _readers++;
                if (oldReaders == 0)
                {
                    // I'm the first reader, let's fight for the lock with the writers
                    _lockAvailble.WaitOne();
 
                    // got lock, notify all other readers they can read
                    _canReadEvent.Set();
                }
                else
                {
                    // wait for the first reader to signal me
                    _canReadEvent.WaitOne();
                }
            }
        }
 
        public void ReleaseReader()
        {
            lock (_readerLock)
            {
                int oldReaders = _readers--;
                if (oldReaders != 1)
                {
                    // If I'm not the last reader, I do nothing here
                    return;
                }
 
                // I'm the last, let's forbid other readers but allow writers or a first reader.
                _canReadEvent.Reset();
                _lockAvailble.Set();
            }
        }
 
        public void LockWriter()
        {
            _lockAvailble.WaitOne();
        }
 
        public void ReleaseWriter()
        {
            _lockAvailble.Set();
        }
 
        public void Dispose()
        {
            _lockAvailble.Close();
            _canReadEvent.Close();
        }
    }

And the entire zip with the other RWLocks and test harness.

A few immediate conclusions:

  1. Writing a working reader writer lock is a very non-trivial problem for a job interview – but watching an applicant struggle with it can give you insights about his know-how around multi-threaded code.
  2. Writing an all-purpose RWLock seems like a daunting task. In my exercise I specifically avoided broad considerations such as fairness & readers upgrading to writers. Testing it for all end cases seems almost impossible (though some formal theoretical tools exists for such correction proofs)
  3. At least for some problems, writing your own lean solution can be better (performance wise) than relying on the de facto standard. While our solutions weren’t better than ReaderWriterLockSlim, they were significantly better than ReaderWriterLock – again, only in the context of this test harness.

Bonus: my first implementation didn’t have a lock statement in LockReader() and ReleaseReader(), and used Interlocked.Increment() and Decrement() to update the _readers variable. Still, it contained a hidden bug – can you find it, and understand why the lock is necessary?

Reading enviornment variables from external processes in C#

I was trying to discern which of 3 java processes is our Tomcat process, to kill rouge Tomcats in our unit tests. We have code that uses Process.GetProcessesByName(), and checks the returned StartInfo.

Apparently, Process.GetProcess() returns empty StartInfo.

Digging around, I failed to find a ready-out-of-the-box way to do this. This StackOverflow article pointed me in the right direction of this CodeProject page.

My final result includes:

  1. A small utility that reads either all enviornment variables from some process, or a specific one.
  2. A small C# wrapper

Here is the usage:

private static string GetJavaWorkingDir(int pid)
{
  string processEnvReader = @"Scripts\ReadProcessEnv\bin\ReadProcessEnv.exe";
  ProcessEnvReader reader = new ProcessEnvReader(processEnvReader);
  return reader.Read(pid, "catalina_base");
}

The shorter the better

I wrote in a previous post about a few refactorings meant to eliminate code duplication. I was reminded recently this principle has a name – DRY, which stands for Don’t Repeat Yourself, and should be applied everywhere.

Eliminating duplication, while a noble task, is not the only refactoring one should practice and apply. Breaking up large pieces of code into exceedingly smaller pieces is also important. It makes your code more readable to yourself and to other developers, and also make merges much easier to accomplish – nothing is more terrible than looking at a huge merge of a huge method or class and having to guess what combination of the versions is the correct one.

Here are a couple of important tips/techniques to make your code more manageable:

Tip I – Group expression into methods

It doesn’t matter if it’s code you wrote or stumbled on. Whenever you get the chance, select a few statements in a long method, use Resharper’s Extract Method refactoring, invent some name to describe what this group of statements does, and voila – you’ve shortened the original method. It’s easier to understand and maintain, and now the new method can be called and tested on its own. This technique can turn huge 200-line monster methods into responsible, comprehensible 15-liners.

Ideally you’d want to use this refactoring on a bunch of statements that actually have a logical cohesive meaning together. Even if that’s not the case, usually you’ll be better off with the extracted method. One notable exception – when the set of statements you’d refactor will lead to a method will multiple out parameters. Then you’d still have to declare these variables in the calling code, and your gain greatly diminishes.

Tip II – Group methods into classes

You’ve all seen the huge classes with dozens of methods. Hell, Tip 1 above is all about creating even more methods! Well, once you feel a class has grown too much, you should try to spot groups of methods that have related functionality. Perhaps most methods in the group are used only by a single method, but nowhere else. In this case, if you move all these methods away to a new class and make them private members of that class, their absense will certainly not be missed in the original class – it only uses the one method anyway (be sure to keep it public, of course). Like in the previous tip, you want to try and find groups of methods that have a common logical function in order to use this refactoring. However, even if you can’t find a common function to these methods, they will still be better off in another class.

While Resharper does have a Move Method refactoring, in some cases it’s better to just cut and paste, especially when you move a bunch of methods together, with the data members they use and all.

Tip III – Reduce nesting level

This will not shrink your methods by a lot, but it will make them more managable. Nesting level implies context, which you have to maintain in your head when you are processing code. When is this code called? Only if 5 different if statements happen to succeed. It’s painful on the eye.

A useful refactoring for this is Resharper’s “Invert If”. It takes a convoluted piece of code such as

public void ProcessUserInput(string name)
{
   if (name != null)
   {
      int id = FindIdByName(name);
      if (id != 0)
      {
         if (TryToStoreName(name, id))
         {
            Console.WriteLine("Stored {0}, {1}", name, id);
         }
      }
   }
}

And beautifies it into

public void ProcessUserInput(string name)
{
   if (name == null) return;
 
   int id = FindIdByName(name);
   if (id == 0) return;
 
   if (TryToStoreName(name, id))
   {
      Console.WriteLine("Stored {0}, {1}", name, id);
   }
}

Much easier to follow (the difference is highly evident in methods with 5 nesting levels or more – these acutely need this refactoring).

Globalizing DateTime.TryParse()

DateTime.TryParse() is the sane method in .NET to parse date strings (don’t use Date.Parse() – you want to handle bad user input correctly, and throwing exceptions usually isn’t the correct way, especially if all you want is to just skip the bad date field). Well, I’ve just discovered yesterday that the dates it returns are in the local computer time. If you try to parse “31/12/08 23:59:59” and you happen to be in GMT+2, the time you’ll get will be 2 hours off.

To solve, simply run ToUniversal() on the resulting DateTime. I use:

bool TryParseUniversal(string dateString, out DateTime date)
{
  if (!DateTime.TryParse(dateString, out date))
    return false;
 
  date = date.ToUniversal();
  return true;
}

A Good Question

C# programmers – you should read this question and its many answers, I’m sure you’ll learn a thing or two. For example, ThreadStaticAttribute – a clean, type-safe way to get thread local storage in C#.

Unhandled Exceptions Crash .NET Threads

A little something I learned at DSM today. It appears if any thread in .NET crashes (lets a thrown exception fly through the top stack level), the process crashes. I refused to believe at first, but testing on .NET 2.0 showed it to be true:

(I should really switch to another blog platform, I didn’t find a decent way to write code in Blogspot).

class ThreadCrashTest
{
  static void Main()
  {
    new Thread(Foo).Start();
    for (int i = 0; i < 10; ++i)
    {
      Console.WriteLine(i);
      Thread.Sleep(100);
    }
  }
 
  private static void Foo()
  {
    Console.WriteLine("Crashing");
    throw new Exception("");
  }
}

According to Yan, the behavior on .NET 3 is to crash the AppDomain instead of the entire process.

11 Tips for Beginner C# Developers

Today I sat with a friend (let’s call him Joe), who just switched from a job in QA to programming, and passed on to him some of the little tips and tricks I learned over the years. I’m sharing it here because I thought it could be useful to other people that are new to programming. The focus of this post is C#, but analogous tools and methods exist for other languages of course.

Refactorings (A.K.A Resharper)

(This is just a short introduction. I recommend the book Refactoring: Improving the Design of Existing Code as further reading)

As I’ve written here before, I just love Resharper. It is the best refactoring tool for C# I know of (even though it’s a bit heavy sometimes – make sure to get enough RAM and a strong CPU). Joe asked me about a C# feature called partial classes. He said his class was just too big (4000 lines) and becoming unmanageable, and he wanted some way to break it to smaller pieces. He also said that at his workplace, they lock the file whenever anyone edits it, because it’s very hard to avoid merge conflicts on such huge files.

I was happy he wanted to simplify and clarify his code by splitting the file, but the way he thought of doing this was the wrong way.

Tip I: Single Responsibility and Information Hiding

You should strive to minimize the information you require at place in your program. Joe had dozens of unit tests, which all derived from a single base class that contain methods required for all the tests. In a second look, we saw that some of the methods were in fact only needed by some subset of the tests, that were actually logically close.

To solve Joe’s problem, we created an intermediate class, which inherited from the common test base class, and changed these classes to inherit from the intermediate class. We then used Resharper’s Push Down Member refactoring, which removed the methods from the test base class into the intermediate class. No functionality was changed, but we removed code from the huge 4000 lines class! By continuing this process, we can break down the huge class into separate related classes with Single Responsibilities, and no class would have access to uneeded information (like methods it doesn’t care about).

Tip II – Duplicate Code Elimination

I believe the world of software would be a better place if people were not allowed to Copy-Paste code more than a few times a day. Many coders abuse this convenient shortcut and thus create unmaintainable code. The effective counter-measure to the Copy-Paste plague is elimination of duplicate code.

I saw in Joe’s long file code that looked similar to this:

alice = Configuration.Instance.GetUsers(“alice”);
bob = Configuration.Instance.GetUsers(“bob”);
charlie = Configuration.Instance.GetUsers(“charlie”);
diedre = Configuration.Instance.GetUsers(“diedre”);

Even though this is a rather simple example of code duplication, I strongly believe even such minor infractions should be dealt with. Every line of the above code snippet knows how to obtain users from the configuration. This knowledge has to be read and maintained by developers. Instead, why not get all the “user getting” code into one place and let us simply write what we want to do, instead of how?

To solve this, I use one of these two techniques:

Extract Method, Tiger Style

  1. Choose a single instance of duplication, and locate any parameters or code that is not the same among all the instances of the duplicated code. In our examples, the username (and the assignment variable) are the only two different things between the four lines of code.
  2. For every such parameter, use Introduce Variable. The end result of this phase should look something like this:
    string username = “alice”;
    alice = Configuration.Instance.GetUsers(username);
    bob = Configuration.Instance.GetUsers(“bob”);
    charlie = Configuration.Instance.GetUsers(“charlie”);
    diedre = Configuration.Instance.GetUsers(“diedre”);

  3. Now, use Extract Method on this code (the first line in our example). This creates a new method that I would call GetUsers, that simply gets a string argument username and reads it from the configuration.
  4. Perform Inline Variable on the variable you created in step 1.
  5. Now, change all the other instances to use this new method and delete the redundant code.

The end result looks like this:

alice = GetUsers(“alice”);
bob = GetUsers(“bob”);
charlie = GetUsers(“charlie”);
diedre = GetUsers(“diedre”);

Another way to achieve the same refactoring is Crane Style (I’m just enjoying using kung-fu styles here because I saw Kung-Fu Panda not too long ago :). You can use immediately without creating the temporary user, but then you get a method that specifically returns the username for “alice”, which is not what you wanted. Nevertheless, this method can be refactoring by applying Extract Parameter on “alice”, netting us the same result.

Of course these two examples do not begin to cover the myriad of ways you can and should refactor the code. What’s important is that you always keep an eye out on how you can make your code more concise, which in turns leads to readability and maintainability.

Tip III: Code Cleanup

Resharper sprinkles colors to the right of your currently open file. Every such colored line is either a compilation error (for red lines), or a cleanup suggestion. Go over such suggestions and hear what Resharper has to say (in the example below it seems nobody is using the var xasdf, so a quick alt-Enter while standing on it will remove it).

Another thing which you should do is define and run FXCop
rules to perform a deeper analysis on your entire project/solution and spot potential problems.

Source Control

Tip IV – Use Source Control

I almost left this out as this goes without saying, but properly using source control can probably save you more time and money than all the other tips (or cost you if you don’t use it). The best source control tool I know of for Visual Studio is of course Team Foundation Server, as it has the best integration with the IDE. Other tools are possible, but you have to have a good reason for choosing something other than TFS (One good reason might be cross-platform development and the desire to keep all your code base in a single repository).

Automatic Unit Tests

At Delver, we use NUnit to write unit tests. In past projects I’ve used Visual Studio’s built in test tool, but I found NUn
it to be slightly better, mainly due to the integration with Resharper. Resharper adds a small green button next to every test, and allows you to run your test directly from there instead of looking for the “Test View” window. Tomer told me just last week that he uses a keyboard shortcut for running tests, but for me, this is one shortcut I don’t think I’ll bother learning (the brain can only hole so much).
Another benefit of NUnit is that it runs the test suite in place in your current source folder, instead of copying everything aside to a separate folder like Visual Studio’s tool (this used to takes me gigs of space of old unit test sessions which were rarely if ever used).

Tip V – Write Autonomous Unit Tests

Back to Joe, he has a few tests that don’t work right out of the box. Before running tests, he has to manually run a separate application used by his test suite. As his project contains hundreds of tests, I would love seeing this added as an automatic procedure in his TestInit() method. Automating a manual operation, besides saving time for developers, enables you to:

Tip VI – Use Continuous Integration

Tests that nobody runs are no good. Tests that are run once when written and then forgotten are only slightly better. By the same logic, tests that run all the time are the best. Pick and use a Build Automation System. I wrote before about our chosen solution, TeamCity, and to sum it up – we’re extremely happy about it. TeamCity runs our tests on every commit, on multiple configurations and build agents, and helps us detect bugs faster.

Know thy IDE

Visual Studio is one powerful tool (not belittling java IDEs like IntelliJ and eclipse which in many cases are better). Learn how to use it’s features to your advantage:

Tip VII – Edit And Continue

Suppose that while debugging, you found a bug. You can edit the code, save, and continue the debug session without losing the precious time you took getting to this point.

Tip VIII – Move Execution Location

See the little yellow marker that signifies the current location inside the program being debugged? This arrow can be moved! It took me quite a while to discover this (actually heard about it from Sagie), but if you take and drag this arrow, Visual Studio will rewind or “fast forward” your execution to the desired point. None of the code gets executed, you just skip to where you want to go. Excellent for going back after executing a critical method, and rerunning it as many times as you wish.

Tip IX– Use Conditional Breakpoints

Don’t waste your time waiting for some specific value to appear in the watch for an interesting variable – set your breakpoints to stop only when the desired conditions are met (right-click on the red breakpoint circle and choose “Condition”).

Tip X – Attach to Process

Got a bug that only happens on production machines? You can attach your IDE to any running process (preferably one that is compiled in debug mode), and debug away. You can also programmatically cause a debugger to attach using System.Diagnostics.Debugger.Lau

Tip XI – Use The Immediate Window

The Immediate window is a great tool for executing short code snippets solely for debugging. You can place a breakpoint inside a method you wish to debug, and then call the method directly from the immediate window (saves you from doing this through Edit And Continue)

Regex Complexity

Today I got a shocker.

I tried the not-too-complicated regular expression:

href=[‘"](?<link>[^?’">]*\??[^’" >]*)[^>]*(?<displayed>[^>]*)</a>

I worked in the excellent RegexBuddy, and ran the above regex on a normal size HTML page (the regex aims to find all links in a page). The regex hung, and I got the following message:

The match attempt was aborted early because the regular expression is too complex.
The regex engine you plan to use it with may not be able to handle it at all and crash.
Look up "catastrophic backtracking" in the help file to learn how to avoid this situation.

I looked up “catastrophic backtracking”, and got that regexes such as “(x+x+)+y” are evil. Sure – but my regex does not contain nested repetition operations!

I then tried this regex on a short page, and it worked. This was another surprise, as I always thought most regex implementations are compiled, and then run in O(n) (I never took the time to learn all the regex flavors, I just assumed what I learned in the university was the general rule).

It turns out that one of the algorithms to implement regex uses backtracking, so a regex might work on a short string but fail on a larger one. It appears even simple expressions such as “(a|aa)*b” take exponential time in this implementation.

I looked around a bit, but failed to find a good description of the internal implementation of .NET’s regular expression engine.

BTW, the work-around I used here is modify the regex. It’s not exactly what I aimed for, but it’s close enough:

href=[‘"](?<link>[^’">]*)[^>]*>(?<displayed>[^>]*)</a>