This blog has moved to Medium

Subscribe via email


Posts tagged ‘.net’

Playing with a few ReaderWriterLocks in .Net

With Delver‘s future clouded by a big question mark, I’m looking for a new job. This reminded me of my previous job hunt, and the part we all “love” about it – job interviews.

Last year, two companies I was interviewing for asked me to implement a ReaderWriterLock. It took me more time than I’d hoped, but I got the implementation done. Since this was one of the hardest interview questions (for me at least), I’ve decided to implement one or two versions, and to test the performance vs the built-in reader writer locks available in .NET.

First, I wrote the test code, which is roughly composed of:

  1. Shared (singleton) counters object with two counters, X & Y.
  2. Some reader threads. When a reader reads, it reads both X & Y and makes sure they are equal.
  3. Some writer threads, that increment X & Y together (thus maintaining X == Y).

Then, I tested this setup without any locks, and as expected the reader threads got different values of X & Y.

Next, I implemented the following locks:

  1. DummyReaderWriterLock – a horribly inefficient implementation that is reader/writer-oblivious. It just locks the object regardless of reads/writes.
  2. SemaphoreReaderWriterLock – already a huge improvment, this locks uses a semphore that holds some large number (~2000) locks. Each reader requests only a single lock from the semaphore, and a writer must obtain ALL locks before continuing. Two writers are prevented from deadlocking by a separate Mutex. One immediate problem with this lock is writer starvation. For a writer to obtain the lock, it must first get the mutex and then all locks in the semaphore. This means a single writer is competing against all readers for the semaphore.
  3. EventReaderWriterLock – implemented by two events and a reader count. The first reader & all writers must get signaled in order to get the lock. Once any reader got the lock, other readers are free to enter without blocking. The last reader out is responsible for signaling the event and letting other writers or readers back in.

I also pulled my team leader Oren into this problem, and he came up with a state based implementation – not that far from my own, with an additional “state” integer that represents whether the lock is in “writing mode”, “reading mode” or free. He also added a few TestAndCompare hacks for checking the state.

Finally, I tested the performance of these locks vs two locks that are availble in .NET 3.5: ReaderWriterLock and ReaderWriterLockSlim. I only tested only one scenario, but was pleasantly surprised to discover the performance of both my EventReaderWriterLock and Oren’s StateReaderWriterLock were identical to that of ReaderWriterLockSlim, and better than ReaderWriterLock! (The performance of DummyReaderWriterLock were literally off the charts).

Here is my implementation:

    /// 
    /// A ReaderWriter lock implemented by events.
    ///
 
    /// An AutoReset event that gives ownership of the lock either to one writer or to all the readers.
    /// A manual reset event that allows readers to enter, and is only reset when the last reader finishes
    /// 
    /// 
    public class EventReaderWriterLock : IDisposable
    {
        /// 
        /// Both readers and writer need this lock to work
        /// 
        private readonly AutoResetEvent _lockAvailble = new AutoResetEvent(true);
 
        /// 
        /// Further readers beyond the first one wait on this object
        /// 
        private readonly ManualResetEvent _canReadEvent = new ManualResetEvent(false);
 
        /// 
        /// Used to synch the reader lock/release
        /// 
        private readonly object _readerLock = new object();
 
        private int _readers;
 
        public void LockReader()
        {
            lock (_readerLock)
            {
                int oldReaders = _readers++;
                if (oldReaders == 0)
                {
                    // I'm the first reader, let's fight for the lock with the writers
                    _lockAvailble.WaitOne();
 
                    // got lock, notify all other readers they can read
                    _canReadEvent.Set();
                }
                else
                {
                    // wait for the first reader to signal me
                    _canReadEvent.WaitOne();
                }
            }
        }
 
        public void ReleaseReader()
        {
            lock (_readerLock)
            {
                int oldReaders = _readers--;
                if (oldReaders != 1)
                {
                    // If I'm not the last reader, I do nothing here
                    return;
                }
 
                // I'm the last, let's forbid other readers but allow writers or a first reader.
                _canReadEvent.Reset();
                _lockAvailble.Set();
            }
        }
 
        public void LockWriter()
        {
            _lockAvailble.WaitOne();
        }
 
        public void ReleaseWriter()
        {
            _lockAvailble.Set();
        }
 
        public void Dispose()
        {
            _lockAvailble.Close();
            _canReadEvent.Close();
        }
    }

And the entire zip with the other RWLocks and test harness.

A few immediate conclusions:

  1. Writing a working reader writer lock is a very non-trivial problem for a job interview – but watching an applicant struggle with it can give you insights about his know-how around multi-threaded code.
  2. Writing an all-purpose RWLock seems like a daunting task. In my exercise I specifically avoided broad considerations such as fairness & readers upgrading to writers. Testing it for all end cases seems almost impossible (though some formal theoretical tools exists for such correction proofs)
  3. At least for some problems, writing your own lean solution can be better (performance wise) than relying on the de facto standard. While our solutions weren’t better than ReaderWriterLockSlim, they were significantly better than ReaderWriterLock – again, only in the context of this test harness.

Bonus: my first implementation didn’t have a lock statement in LockReader() and ReleaseReader(), and used Interlocked.Increment() and Decrement() to update the _readers variable. Still, it contained a hidden bug – can you find it, and understand why the lock is necessary?

Avoid expensive ToString() calls in log4net

premature optimization is the root of all evil.” (Donald Knuth)

And, if I might add: “ugly optimization is not so great either”.

We have some classes with rather expensive ToString() methods in our code base. When used in logs (specifically, log4net), we incured the performance penality even when the logging level (Debug/Info/Warn/…) was too low, and the log wasn’t even printed. In other words, ToString() was called before the log method, in order to build the single string that is the input parameter to Logger.Log().

Our ugly optimization was to add checks before every log line, as in:

if (logger.IsDebugEnabled)
   logger.Debug("Calling " + someObjectWithExpensiveToString);

A better solution, which I just tested to work, is to use the …Format() family of methods. This way, you pass objects and not strings, which are then only converted to strings if there is a need.

logger.DebugFormat("Calling {0}", someObjectWithExpensiveToString);

The following test assures me it actually works:

ILog logger = LogManager.GetLogger("Foo");
ToStringCounter counter = new ToStringCounter();
Assert.AreEqual(0, counter.ToStrings);
Assert.IsFalse(logger.IsDebugEnabled);
 
logger.DebugFormat("Expecting zero ToString() calls: {0}", counter);
Assert.AreEqual(0, counter.ToStrings);

Where ToStringCounter is simply a class that counts the number of calls to its ToString() method.

Globalizing DateTime.TryParse()

DateTime.TryParse() is the sane method in .NET to parse date strings (don’t use Date.Parse() – you want to handle bad user input correctly, and throwing exceptions usually isn’t the correct way, especially if all you want is to just skip the bad date field). Well, I’ve just discovered yesterday that the dates it returns are in the local computer time. If you try to parse “31/12/08 23:59:59” and you happen to be in GMT+2, the time you’ll get will be 2 hours off.

To solve, simply run ToUniversal() on the resulting DateTime. I use:

bool TryParseUniversal(string dateString, out DateTime date)
{
  if (!DateTime.TryParse(dateString, out date))
    return false;
 
  date = date.ToUniversal();
  return true;
}

Regex Complexity

Today I got a shocker.

I tried the not-too-complicated regular expression:

href=[‘"](?<link>[^?’">]*\??[^’" >]*)[^>]*(?<displayed>[^>]*)</a>

I worked in the excellent RegexBuddy, and ran the above regex on a normal size HTML page (the regex aims to find all links in a page). The regex hung, and I got the following message:

The match attempt was aborted early because the regular expression is too complex.
The regex engine you plan to use it with may not be able to handle it at all and crash.
Look up "catastrophic backtracking" in the help file to learn how to avoid this situation.

I looked up “catastrophic backtracking”, and got that regexes such as “(x+x+)+y” are evil. Sure – but my regex does not contain nested repetition operations!

I then tried this regex on a short page, and it worked. This was another surprise, as I always thought most regex implementations are compiled, and then run in O(n) (I never took the time to learn all the regex flavors, I just assumed what I learned in the university was the general rule).

It turns out that one of the algorithms to implement regex uses backtracking, so a regex might work on a short string but fail on a larger one. It appears even simple expressions such as “(a|aa)*b” take exponential time in this implementation.

I looked around a bit, but failed to find a good description of the internal implementation of .NET’s regular expression engine.

BTW, the work-around I used here is modify the regex. It’s not exactly what I aimed for, but it’s close enough:

href=[‘"](?<link>[^’">]*)[^>]*>(?<displayed>[^>]*)</a>

TimeoutStream

What do you do when you want to read all the data from a Stream object in .NET? You use StreamReader.ReadToEnd().

Stream stream = ...;
StreamReader reader = new StreamReader(stream);
string data = reader.ReadToEnd();

Apparently there is no sane way to put a timeout on the above logic. If you call it and your stream doesn’t have a timeout, you’re doomed. Even if you stream has an internal timeout, you could still be doomed – for example, say you are reading from a website that sends you the letter ‘A’ every second. The timeout on the HttpResponseStream could be used, but still (assuming it’s higher than one second), you can’t set a timeout to the entire ReadToEnd() operation.

I wrote a small proxy class TimeoutStream that wraps any stream with a total timeout since the moment of its creation. Any blocking operation performed on it will fail if the stream has been created too far in the past.

I could use Asynchronous IO to sort of guarantee completion in this timeout without relying on a timeout on the stream itself – however, it appears to be impossible to do correctly in .NET – there is no good way to kill that IO operation if it does not complete within the allotted timeout (see here)- so far now this is good enough for our needs because we can indeed set a timeout on the HttpWebRequest object itself (in addition to using TimeoutStream).

The code is available here.

Why is Open Source so hard?

Today I came with with an idea for a cool feature for RSS clients. What I want is intelligent filtering like StumbleUpon – users would click “I like it” or “I like it not” on RSS items, and the client could learn a user’s “taste” and prioritize future items. Google revealed someone just thought of this about 2 weeks ago. Still, I don’t want to pay 20$ for an RSS reader, and I don’t want a reader that specializes in this, I want this as just another feature of a standard RSS reader, so I went on to try and add this to some existing open source RSS client.

I immediately found NClassifier, an open source .NET text classifier. Looks promising.

For an RSS client I mainly found Rssbandit. A bit ugly, but I could work with that.

Then, crash. I spent several good hours just trying to get the blasted code. At first I naively thought perhaps the code would be attached to some version of the installer. Nope. I went to the CVS, installed TortoiseCVS, and failed to login with a blank password. At this point I decided to try accessing the SVN version, since I’m already familiar with SVN. I managed to get the code, but it did not build due to a LicenseException 🙁

I figured the SVN version might be different from the CVS, so I went back to my efforts to use TortoiseCVS, and succeeded … to connect. In SVN, I could just get the entire latest version. In CVS, I had to choose “packages” to download. Trying some packages, I found that TortoiseSVN does download, only it deletes the downloaded file just at the end of the download. ARG!

I really don’t understand why this entire process has to be so complicated. This entire process should have taken half an hour, including downloading the code, modifying and initial testing. Why almost every time you download open source code, it never compiles on the first build. Why the SVN and CVS versions are different.

Bottom Line
I still like my idea of intelligent, learning RSS reader. If anyone has more knowledge in the mysterious ways of Sourceforge and wants to help, call.

Dotnet Web Crawler Speedup

I’m writing a web crawler in C#, and getting it to perform well was really annoying.
I tried simply using ThreadPool.QueueUserWorkItem() to queue up my requests to multiple threads. Each thread just ran WebClient.DownloadString().

While the threads did run in parallel, it turned out WebClient had an inherent lock.
I tried messing with the ConnectionManagementSection, but that turned out read-only.
After some Google, I found that the configuration can only be changed by modifying the machine.config or user.config files! Seems pretty stupid to me.

After doing that simply didn’t work either, I found this code that helped me through. I still don’t know exactly why WebClient.DownloadString() doesn’t work, but after some tweaking I got to about 2.5 pages pre second. Still not top speed, but way better than the 0.5 pages/second I started with.