This blog has moved to Medium

Subscribe via email


1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...

Posts tagged ‘.net’

ALT.NET #3

The third Alt .Net convention will take place on 26/03 (planning on 25/03).

Read about my impressions from last time, and come register on Facebook. I will not be attending this year because of a ski trip, but you are all encouraged to come.

An open source StackOverflow clone by a n00b web dev

Edit – just so you know, since writing this I found Shapado and OSQA, which seem well underway to becoming viable Stack Exchange alternative. I don’t think I will continue developing this project, although it has taught me a great deal nonetheless.

I wanted to share a small learning exercise I underwent recently. I decided to learn how to build a website, and share the experience here. Lacking an original idea at the moment, I decided to create yet another StackOverflow clone – not original, but a good exercise nonetheless. The code for the project is hosted at GitHub, although I don’t have a live showcase up at the moment. Yes, I am aware a google search reveals an existing open source SO clone called Stacked – I thought building one from scratch might teach me more than reading someone else’s code base (I could be wrong, but this is how I wanted to roll).

Day 1

The choice of database engine was easy: The one database I had any experience with was mysql, and being open source, free and easy to work with, I went with it. Next, an ORM library. After some digging, I’ve decided to go with NHibernate + Castle ActiveRecord. ActiveRecord is great for easily mapping simple classes and relations, and NHibernate to complement it in ‘advanced queries’. I don’t get any LINQ magic from these ORMs, which is a pity. I then proceeded to create a solution and some projects, added NUnit as a test framework, and setup Castle MicroKernel as an IOC framework.

I tried to found an open hosted source CI server, but didn’t find anything that worked. Oh well, not essential for now.

Day 2

Before proceeding any further, I had to choose a source control provider. If this were a production project I would probably have gone with SVN hosted on Google Code. However, Ken told me about Git long ago, and I thought this was a good chance to experience it. So, on to GitHub. Opening the project was a no brainer, but finding a decent client was more challenging. I had some fun learning about Git’s private keys, and configuring TortoiseGit (I tried GitExtensions, but it doesn’t support Visual Studio 2010 beta 2 yet). Overall, TortoiseGit gets the job done, after some tweaking and .gitignore files.

Git’s distributed source control model is interesting and worth a try.

I created the User, Question & Vote tables, learned about composite keys in AR, and wrote my first NH query:

GetVoteCount – “SELECT Vote, COUNT(*) FROM ” + VotesTableName + ” WHERE PostId = :postId GROUP BY vote”

I currently don’t have any caching on the vote count – am simply storing the votes as relations between users and questions, and counting them on the fly.

Day 3

I would like to implement full features, not write the entire DAL first and then the application logic. So, it’s time to start learning about web development. At work people are using Monorail, but after reading this question I decided to try out ASP.Net MVC instead. So, I read a basic tutorial and starting coding. Some things I learned:

  1. I finally got the meaning of Global.asax.cs – it’s simply the ‘main’ of the web application.
  2. By default, ASP.Net MVC creates the controllers by itself and does not support IOC. Fixed (remember to setup the Controller’s lifestyle as LifeStyle.PerWebRequest).
  3. Some Asp.Net MVC basics:
    1. Use <%= … %> to write to the output stream (that gets sent as the HTML), and <% %> to simply execute code.
    2. Use Html.RenderAction() to create links to other pages (~= Actions)
    3. Your pages are butt ugly without tweaking the CSS

Day 4

  • I quickly caught myself duplicating code, and turned to learn about Partial Views, which are reusable View pieces.
  • I realized that having my model entities derive from ActiveRecordBase is damn ugly, because it makes my entire application dependant on AR even if I was using repositories to access the data. I switched the repositories to using ActiveRecordMediator instead.
  • The magic that is ReturnUrl – an extra request parameter that controllers use to return you to your original page after you login.
  • An interesting usage for anonymous object creation syntax in C#, to pass query parameters: new { ReturnUrl = “foo”}
  • I decide to create a base class for all my controllers – UserAwareController. This was needed because every controller needs to access the current user, so I put all this logic in the UserAwareController base class.
  • Since every View needs the User to render, I created a Model base class that contains the current user. I’m not sure if this is the best way to go here, but it worked (what are your recommended best practices to store the user data?)
  • I implemented OpenID login using DotNetOpenAuth, and it was quite a breeze. No need to store user credentials, just store his public open id and let other websites handle the authentication for you.

Day 5

After allowing users to login and post questions, the next thing I wanted to implement was voting. So far all the code was server-side, but voting requires javascript because when a user votes you don’t want to refresh the page, but rather just change the vote icon. So:

  • I learned about jquery basics and wrote events to handle clicking the voting buttons.
  • Sent the vote information to a dedicated controller using JSON. The way JSON requests translate to controller methods is really seamless.
  • Initially I had an ‘AddVote’ method, but quickly switched to ‘UpdateVote’, which makes more sense.
  • Some css tweaks to make the cursor change to a pointer while on the voting buttons

Day 6

  • I finally had to cache vote count on questions & answers. The total vote count / score of a question has to be indexed, because we’ll have pages that get the ‘hottest posts’, and so keeping the User-Question vote relation is not enough.
  • So far, all my entities were strictly mapped to database rows. Now, I had to create a new ‘rich entity’ that contained a post and the current users’ vote on this post.
  • Finding myself duplicating logic between questions & answers, I create a Post base class and factored the entities and repositories to work on abstract posts.

This is it for now. I hope I didn’t make too many glaring mistakes in the process. As I mentioned, the code is available at GitHub – if you’re interested in helping develop it or have any questions, please let me know.

ALT.NET Israel Tools #1

Come hear about .Net tools in a “no bullshit” evening (more details here).

Playing around with PLINQ and IO-bound tasks


I recently downloaded Visual Studio 2010 beta, and took the chance to play with PLINQ. PLINQ, for those of you in the dark ages of .Net Framework 2.0, is parallel LINQ – an extension to the famous query language that makes it easy to write parallel code (essential to programming in the 21th century, in the age of the many-core).

A code sample, as usual, is the best demonstration:

public static int CountPrimes(IEnumerable<int> input)
{
    return input.AsParallel().Where(IsPrime).Count();
}
 
private static bool IsPrime(int n)
{
    for (int i = 2; i*i < n; ++i)
        if (n % i == 0)
            return false;
    return true;
}

This code sample, regardless of using an inefficient primality test, is fully parallel. PLINQ will utilize all your cores when running the above code, and I didn’t have to use any locks, queues, threadpools or any of the more complex tools of the trade. Just tell PLINQ “AsParallel()”, and it works.

I hit some gotcha when I tried to compare the parallel performance with the sequential one. Do you spot the problem in the following code?

public static void CountPrimesTest(IEnumerable<int> input)
{
    // parallel benchmark 
    var timer = new Stopwatch();
    timer.Start();
    CountPrimes(input.AsParallel());
    timer.Stop();
    Console.WriteLine("Counted primes in parallel took " + timer.Elapsed);
 
    // sequential benchmark
    timer = new Stopwatch();
    timer.Start();
    CountPrimes(input);
    timer.Stop();
    Console.WriteLine("Counted primes sequentially took " + timer.Elapsed);
}

This is all fine and dandy when the task at hand is CPU bound, but works pretty miserabbly when your task is IO bound, like downloading a bunch of web pages. Next, I simulated some IO-bound tasks (I used Sleep() to emulate IO – basically not using a lot of CPU for every task):

[ThreadStatic]
private static Random _random;
 
public static List<string> FindInterestingDomains(IEnumerable<string> urls)
{
    // select all the domains of the interesting URLs
    return urls.AsParallel().Where(SexFilter).
                Select(url => new Uri(url).Host).ToList();
}
 
public static bool SexFilter(string url)
{
    if (_random == null)
        _random = new Random();
 
    // simulate a download
    Thread.Sleep(1000);
    var html = "<html>" + _random.Next() + "</html>";
    return html.Contains("69");
}

Testing this with a list of 10 URLs took 5 seconds, meaning LINQ again spun only two cores, which is the number of cores on my machine. This really sucks for IO bound tasks, because most of the time the threads are idle, waiting on IO. Let’s see if we can speed this up:

// Use WithDegreeOfParallelism to specify the number of threads to run
return urls.AsParallel().WithDegreeOfParallelism(10).Where(SexFilter).
              Select(url => new Uri(url).Host).ToList();

This appeared not to work at first, because WithDegreeOfParallelism is just a recommendation or upper bound. You can ask PLINQ nicely to run with ten threads, but it will only allocate two if it so chooses. This is yet another example of C# being more magical than Java – compared to Java’s rich ExecutorService, PLINQ offers less fine grained control.

However, further testing revealed the damage is not so horrible. This is what happened when I put the above code in a while(true):

Tested 10 URLs in 00:00:05.0576333
Tested 10 URLs in 00:00:03.0018617
Tested 10 URLs in 00:00:03.0013939
Tested 10 URLs in 00:00:03.0013175
Tested 10 URLs in 00:00:04.0018983
Tested 10 URLs in 00:00:03.0024044
Tested 10 URLs in 00:00:01.0004407
Tested 10 URLs in 00:00:01.0007645
Tested 10 URLs in 00:00:01.0007280
Tested 10 URLs in 00:00:01.0003358
Tested 10 URLs in 00:00:01.0003347
Tested 10 URLs in 00:00:01.0002470

After some trial and error, PLINQ found that the optimal number of threads needed to run this task under its concurrency guidelines is ten. I imagine that if at some point in the future the optimal number of threads change, it will adapt.

P.S.
If you found this interesting, wait till you read about DryadLINQ – it’s LINQ taken to the extreme, run over a cluster of computers.

Java is less magical than C#

I have been programming in C# for several years now, and recently made the switch to Java (at least for now). I noticed that Java, as a language, is “less magical” than C#.

What do I mean by that is that in C# things are usually done for you, behind the scenes, magically, while Java is much more explicit in the toolset it provides. For example, take thread-local storage. The concept is identical in both langauges – there is often a need for a copy of a member variable that’s unique to the current thread, so it can be used without any locks or fear of concurrency problems.

The implementation in C# is based on attributes. You basically take a static field, annotate it with [ThreadStatic], and that’s it:

[ThreadStatic]
private static ThreadUnsafeClass foo = null;
 
private ThreadUnsafeClass Foo
{
  get
  {
    if (foo != null)
      foo = new ThreadUnsafeClass(...);
 
    // no other thread will have access to this copy of foo
    // note - foo is still static, so it will be shared between instances of this class.
    return foo;
  }
}

How does it work? Magic. Sure, one can find the implementation if he digs deep enough, but the first time I encountered it I just had to try it to make sure it actually works, because it seemed too mysterious.

Let’s take a look at Java’s equivalent, ThreadLocal. This is how it works (amusingly enough, from a documentation bug report):

public class SerialNum {
     // The next serial number to be assigned
     private static int nextSerialNum = 0;
 
     private static ThreadLocal<Integer> serialNum = new ThreadLocal<Integer>() {
         protected synchronized Integer initialValue() {
             return new Integer(nextSerialNum++);
         }
     };
 
     public static int get() {
         return serialNum.get();
     }
 }

No magic is involved here – get() gets the value from a map, stored on the calling Thread object (source code here, but the real beauty is that’s it’s available from inside your IDE without any special effort to install it).

Let’s look at another example – closures.

In C#, you can write this useful piece of code:

var list = new List<int>();
...
// find an element larger than 10
list.Find(x => x > 10);

You can also make this mistake:

var printers = new List<Action>();
...
foreach (var item in list)
{
  printers.Add(() => Console.WriteLine(item));
}
Parallel.Foreach(printers, p => p())

An innocent reader might think this prints all the items in list, but actually this only prints the last items list.Count times. This is how closures work. This happens because the item referred to in the closure is not a new copy of item, it’s actually the same item that’s being modified by the loop. A workaround is to add a new temporary variable like this:

foreach (var item in list)
{
  int tempItem = item;
  printers.add(() => Console.WriteLine(tempItem));
}

And in Java? Instead of closures, one uses anonymous classes. In fact, this is how they are implemented under the hood in C#. Here the same example, in Java:

for (Integer item : list)
{
  final int tempItem = item;
  printers.add(new Action(){
    public void doAction()
    {
      // can't reference item here because it's not final.
      // this would have been a compilation error
      // system.out.println(item);
      System.out.println(tempItem);
    });
}
...

Notice it’s impossible to make the mistake and capture the loop variable instead of a copy of it, because Java requires it to be final. So … less powerful perhaps than C#, but more predictable. As a side note, Resharper catches the ill-advised capturing of local variables and warns about it.

I myself rather prefer the magic of C#, because it does save a lot of the trouble. Lambdas, properties, auto-typing variables… all these are so convenient it’s addictive. But I have to give Java a bit of credit, as the explicit way of doing stuff sometimes teaches you things that you just wouldn’t have learn cruising away in C# land.

Israeli Developers Community Conference 2009

Check out the idcc, register (free), vote on the topics, and attend.

Q.E.D.

P.S.

Actually, registration costs 100 NIS.

Alt.net 2nd conference

I just attended my first alt.net conference (some would call it unconference). The story is about a group of 40 people that came to talk about … whatever they decided to talk about. The conference is self-organizing, with no predetermined lectures or lecturers, and with one healthy rule – if you don’t feel you are learning or contributing at the discussion you are currently having, you have to get up and find another discussion.

Here are some of the talks I attended (here is a semi-readable list of all the talks):

Aspect Oriented Programming

Usages other than logging, AOP frameworks.

Links: Cthru, Post#, Wicca.

Mocking/Stubbing

Reiterate the basic paradigm, emphasize on TypeMock. They are considering a UI tool adding to Visual Studio to help create mocks – meant for people just starting with mocking. The intended usage is:

  1. Write a test, without any mocking
  2. The test will usually fail because some deep class is not configured correctly.
  3. You will see the chain of calls that caused the exception, and be able to automatically generate a mock for any class in the chain.
  4. Rinse & repeat until your test passes

High Scale & Distributed Caches

The discussion focused around what I’d call medium scale – 2-10 nodes that used shared caches like memcached & Azure.

Multithreading

There was a comparison of Microsoft CCR and Parallel Extensions. It seems people still think of parallelization as simply utilizing all your cores, when it’s much more than that. Some applications benefit from multithreading even on a single core machine (think web crawler).

One interesting link – PowerThreading library (see this video for a demonstration of Asynchronous Programming Model using PowerThreading).

On the evils of yield

I absolutely love yield. Don’t we all? It simplifies writing enumerations to the point of absurdity. Simplicity, however, is a double-edged sword – I spent the better part of this day debugging a most evil bug, that resulted from over-yielding.

At Delver (now Sears) we have a file-based repository containing millions of items. We try to make things as efficient as possible, and sometime we overdo it. Our sin for today is using IEnumerable a bit too much. This repository was designed to be:

  1. Scalable (within our constraints) – should able to hold several million tasks
  2. Fast
  3. Relatively convenient to use – the user should be able to iterate on it using foreach, for one.

To accomplish 1 and 2, we avoided allocating large in-memory structures because they wouldn’t be able to hold the amount of items we’re talking about. To provide a convenient interface, we used IEnumerable.

Here is a mock-up of the code (for simplicity, it doesn’t use the disk but an in-memory serialized dictionary):

public class PeopleRepository
{
    private readonly Dictionary _serializedPeople = new Dictionary();
 
    public void Save(Person person)
    {
        // this method, as innocent as it looks, make it more difficult to discover the bug. See ahead.
        Save(new[]{person});
    }
 
    public void Save(IEnumerable<Person> people)
    {
        var serializedPeople = from p in people select new {p.ID, p.Serialized};
        foreach (var p in serializedPeople)
            _serializedPeople[p.ID] = p.Serialized;
    }
 
    public IEnumerable<Person> Read(Predicate<Person> predicate)
    {
        foreach (int id in _serializedPeople.Keys)
        {
            var person = new Person(_serializedPeople[id]);
            if (predicate(person))
                yield return person;
        }
    }
}

The bug I tracked was that updates to the repository were not taking place, but instead were simply ignored. The first thing I tried, was writing a simple test:

// Setup
var repository = new PeopleRepository();
repository.Save(new Person(1, " John")); // oops, I put an extra space here
 
// Find &amp; Fix John
var john = repository.Read(p => p.ID == 1).First();
john.Name = john.Name.Trim();
 
// Fix poor John back to the repository
repository.Save(john);
 
// Make sure john is saved properly
john = repository.Read(p => p.ID == 1).First();
if (john.Name != "John")
    throw new Exception();

Sadly, this test passed with flying colors. More debugging revealed the problem happened because both our Read and Save methods returned IEnumerables. It appears that Read() read the items and made the required updates … but … when writing the items back to the repository, it iterated on them.

Let me repeat – we read some items, iterated on them and modified some, saved and thus reiterated. Bam!

Bam

The second iteration didn’t iterate on the modified items – because the internal implementation of Read used a yield statement, there was no actual collection returned. So the second iteration just caused the repository to re-read all the items from disk, and ignore the modified items.

Conclusion: whenever you see methods that return IEnumerables, be suspicious. Odds are it should return a List or Collection. And whatever you do, watch out from feeding that IEnumerable back to the same repository.

Here is a final test that almost reproduces the problem. It crashes with a CollectionWasModified exception, while my actual test & code just silently failed (because the repository I mocked up here doesn’t save to the disk, but rather keep everything in-memory).

// Setup
var repository = new PeopleRepository();
repository.Save(new Person(1, " John")); // oops, I saved a space in front of John
 
// Let's read and fix all people starting with a space
var people = repository.Read(p => p.Name.StartsWith(" "));
foreach (var person in people)
    person.Name = person.Name.Trim();
 
// store the modified points back
repository.Save(people);