This blog has moved to Medium

Subscribe via email


Posts tagged ‘Programming’

Alt.net 2nd conference

I just attended my first alt.net conference (some would call it unconference). The story is about a group of 40 people that came to talk about … whatever they decided to talk about. The conference is self-organizing, with no predetermined lectures or lecturers, and with one healthy rule – if you don’t feel you are learning or contributing at the discussion you are currently having, you have to get up and find another discussion.

Here are some of the talks I attended (here is a semi-readable list of all the talks):

Aspect Oriented Programming

Usages other than logging, AOP frameworks.

Links: Cthru, Post#, Wicca.

Mocking/Stubbing

Reiterate the basic paradigm, emphasize on TypeMock. They are considering a UI tool adding to Visual Studio to help create mocks – meant for people just starting with mocking. The intended usage is:

  1. Write a test, without any mocking
  2. The test will usually fail because some deep class is not configured correctly.
  3. You will see the chain of calls that caused the exception, and be able to automatically generate a mock for any class in the chain.
  4. Rinse & repeat until your test passes

High Scale & Distributed Caches

The discussion focused around what I’d call medium scale – 2-10 nodes that used shared caches like memcached & Azure.

Multithreading

There was a comparison of Microsoft CCR and Parallel Extensions. It seems people still think of parallelization as simply utilizing all your cores, when it’s much more than that. Some applications benefit from multithreading even on a single core machine (think web crawler).

One interesting link – PowerThreading library (see this video for a demonstration of Asynchronous Programming Model using PowerThreading).

A survey of open APIs

I did some research in the past week on a few “Open APIs”, and wanted to share my findings here. This is just a summary of other, more comprehensive sources. Also, if you have any comments or corrections I’d love to hear them. I chose to present my findings as a list of concepts:

The Open Stack

This is an emerging stack of open protocols that will help build socially-connected websites. I will explore the key elements (I take XRDS-Simple to be rather low level and uninteresting).

OpenID

  • A single sign-on protocol (help to user not to create yet another set of user/pass)
  • Essential Workflow (here in more details):
    1. You want to logon to StackOverflow, which is an OpenID Consumer
    2. Instead of opening yet another account you are given an alternative (almost no site relies solely on OpenID).
    3. You either enter a URL (way less user friendly) or select from a fixed subset of Providers
    4. You are redirected to that URL, enter your credentials there (only if you are not logged in), and are asked to allow StackOverflow access to your OpenID identity.
    5. Depending on your OpenID provider, you can set for how long this access is granted
    6. Then, you are redirected back to StackOverflow, with a token (encoded in the URL), that is used to grant you access.
  • OpenID is mostly still just a login method today (doesn’t convey extra information beyond a logon grant) – although I did see some evidence to the contrary when I just opened an OpenID account at VeriSign – it seems websites can request more information from an OpenID provider – such as email, nickname and full name.
  • Microsoft, Yahoo, Google are now OpenID providers (in addition to more veteran providers). This is significant because it doesn’t force users to go to yet another place to open an OpenID account – they can just use an existing webmail account.
  • Facebook just joined OpenID foundation board (eBay is there already). Looks like it will become an OpenID provider, maybe also client.
  • Users are still not comfortable with it / unable to figure it out.
  • However, here is an interesting post about using email addresses as OpenID. When this happens, it might help bring in the users.
  • Here is a recipe to enable OpenID on a website (Consumer). Sample Provider implementation for .NET is here.
  • Of course, for your WordPress blog the process is much easier – just install OpenID plugin.
    • Although, I heeded Tomer’s advice and instead used OpenID delegation – this means I use my blog’s URL as my OpenID, but it is just a redirect to a more serious OpenID provider. OpenID is/will be the keys to your world – better guard them safely.
  • Right now, the value of OpenID to a new website is still limited:
    1. Will not eliminate the need for us to implement user logon
    2. Not much value in being an OpenID Provider – it’s a nice to have feature, but in many cases not worth the cost (at least until you get a large user base).
    3. Is not sufficient, on its own, to get access to complete profile information about users (and use this data to help users interact with your website). But … can be complemented with more technologies.
  • Has a nice usage graph.

Google OpenSocial

  • A set of social APIs developed by Google (get list of friends, publish stories).
  • Implemneted y hi5, LinkedIn, MySpace, orkut, Yahoo!, Friendster (full list at the OpenSocial wiki)… but not Facebook (yet). All of these are OpenSocial Containers – data warehouses that answer OpenSocial queries.
  • OpenSocial is Open. The data is not stored on Google – the providers just conform to the OpenSocial API which reflects the data stored at each social network.
  • Had theoretical reach (not usage!) of 700M users Nov 08.
  • To serve OpenSocial:
  • Client implementations are available in javascript, java, .net, php, whatnot…

Google FriendConnect

  • An application that uses OpenSocial to enhance websites with social widgets (Comments, Reviews, …).
  • Is still not wildly spread (this directory is rather sparse at the moment)
  • Doesn’t seem to be targeted at big sites, rather at small sites/blogs (my impression at least).
  • No programming required to add social features to your site – but you have limited control over the effect.
  • Cool flow I tested: Login to FriendConnect on a website, and you already see in your friends list a buddy from Gmail that’s using FriendConnect at this site.

OAuth

Portable Contacts

Facebook Connect

  • Competitor of the open stack (OpenID+OpenSocial) – gives single sign-on + connects you to Facebook friends & feed
  • Uses a popup instead of redirecting to FB (less intimidating for users).
  • Has already been witnessed to boost new user signup.
  • Main Flow:
    1. User clicks Connect 
    2. Popup (in facebook.com) asks user to confirm
    3. User is shown a Presence Indicator at the target site
    4. Website can pull user’s profile & connections.
    5. Publish stories to Facebook.
    6. Send Facebook hashes of emails, and Facebook replies if they have a user with an identical hash. This can be used to show a user a count of his Facebook buddies that are using the target site (“10 of your friends are using this site, connect to Facebook now”).
  • Example of a FB connect-enabled site – TheRunAround.

RPX / JanRain

  • JanRain – An early OpenID player (in the market since 2005, one of the founders of OpenID).
  • RPXNow – abstracts away Single Sign-on, supports both Facebook Connect and major OpenID players.
  • Here is a blog post about why not to use it (Vendoer lock-in, single point of failure, too little benefit).
    • However, check out the counter arguments in the comments.
  • RPX get social profile data from Google, MySpace, Facebook.
    • This includes interesting profile fields like email/gender/birthday/photo.
  • The API also hints at getting an array of friends from relevant services.

That’s it for now, I hope you enjoyed this review. Remember that most of these APIs are very new or just being adopted, so expect changes to most of these. I expect a large API convergence to happen in the following year or two, which will simplify life for those of us building social applications.

Fewer projects makes a lean solution

Bogen just optimized our visual studio solution by cutting down the number of projects in it from about 60 to 20 something. Compile time is blazing fast now, thanks dude (though this means less time for me to check Google Reader 🙁 )

On the evils of yield

I absolutely love yield. Don’t we all? It simplifies writing enumerations to the point of absurdity. Simplicity, however, is a double-edged sword – I spent the better part of this day debugging a most evil bug, that resulted from over-yielding.

At Delver (now Sears) we have a file-based repository containing millions of items. We try to make things as efficient as possible, and sometime we overdo it. Our sin for today is using IEnumerable a bit too much. This repository was designed to be:

  1. Scalable (within our constraints) – should able to hold several million tasks
  2. Fast
  3. Relatively convenient to use – the user should be able to iterate on it using foreach, for one.

To accomplish 1 and 2, we avoided allocating large in-memory structures because they wouldn’t be able to hold the amount of items we’re talking about. To provide a convenient interface, we used IEnumerable.

Here is a mock-up of the code (for simplicity, it doesn’t use the disk but an in-memory serialized dictionary):

public class PeopleRepository
{
    private readonly Dictionary _serializedPeople = new Dictionary();
 
    public void Save(Person person)
    {
        // this method, as innocent as it looks, make it more difficult to discover the bug. See ahead.
        Save(new[]{person});
    }
 
    public void Save(IEnumerable<Person> people)
    {
        var serializedPeople = from p in people select new {p.ID, p.Serialized};
        foreach (var p in serializedPeople)
            _serializedPeople[p.ID] = p.Serialized;
    }
 
    public IEnumerable<Person> Read(Predicate<Person> predicate)
    {
        foreach (int id in _serializedPeople.Keys)
        {
            var person = new Person(_serializedPeople[id]);
            if (predicate(person))
                yield return person;
        }
    }
}

The bug I tracked was that updates to the repository were not taking place, but instead were simply ignored. The first thing I tried, was writing a simple test:

// Setup
var repository = new PeopleRepository();
repository.Save(new Person(1, " John")); // oops, I put an extra space here
 
// Find &amp; Fix John
var john = repository.Read(p => p.ID == 1).First();
john.Name = john.Name.Trim();
 
// Fix poor John back to the repository
repository.Save(john);
 
// Make sure john is saved properly
john = repository.Read(p => p.ID == 1).First();
if (john.Name != "John")
    throw new Exception();

Sadly, this test passed with flying colors. More debugging revealed the problem happened because both our Read and Save methods returned IEnumerables. It appears that Read() read the items and made the required updates … but … when writing the items back to the repository, it iterated on them.

Let me repeat – we read some items, iterated on them and modified some, saved and thus reiterated. Bam!

Bam

The second iteration didn’t iterate on the modified items – because the internal implementation of Read used a yield statement, there was no actual collection returned. So the second iteration just caused the repository to re-read all the items from disk, and ignore the modified items.

Conclusion: whenever you see methods that return IEnumerables, be suspicious. Odds are it should return a List or Collection. And whatever you do, watch out from feeding that IEnumerable back to the same repository.

Here is a final test that almost reproduces the problem. It crashes with a CollectionWasModified exception, while my actual test & code just silently failed (because the repository I mocked up here doesn’t save to the disk, but rather keep everything in-memory).

// Setup
var repository = new PeopleRepository();
repository.Save(new Person(1, " John")); // oops, I saved a space in front of John
 
// Let's read and fix all people starting with a space
var people = repository.Read(p => p.Name.StartsWith(" "));
foreach (var person in people)
    person.Name = person.Name.Trim();
 
// store the modified points back
repository.Save(people);

Efficient multiple totient calculation in C#

For a certain problem in Project Euler, I needed a function to calculate the Totient of all numbers in [2..10^7]. Going over them one at a time and factorizing them would take too long, so I used a variant of the prime sieve to speed up the calculation. The code is not overly documented, but it should be relatively easy to grok once you understand how to calculate the Totient of a single number.

The time complexity is O(n log n).

/// <summary>
/// Return an array of all totients in [0..n-1]
/// </summary>
/// <param name="n"></param>
/// <returns></returns>
public static long[] CalcTotients(long n)
{
    long[] divisors = GetDivisors(n);
    long i;
    var phi = new long[n];
    phi[1] = 1;
    for (i = 1; i < n; ++i)
        CalcTotient(i, phi, divisors);
 
    return phi;
}
 
/// <summary>
/// For every integer, the result will contain its lowest divisor.
/// </summary>
/// <param name="n"></param>
/// <returns></returns>
private static long[] GetDivisors(long n)
{
    var divisors = new long[n];
    divisors[1] = 1;
    long i;
    for (i = 2; i < n; ++i)
    {
        if (divisors[i] != 0)
            continue;
 
        for (long j = i; j < n; j += i)
            divisors[j] = i;
    }
    return divisors;
}
 
private static long CalcTotient(long i, long[] phi, long[] divisors)
{
    if (phi[i] != 0)
        return phi[i];
 
    long div = divisors[i];
    if (div == i)
    {
        phi[i] = i - 1;
        return phi[i];
    }
 
    long lower = 1;
    int exp = 0;
    while ((i > 1) && (i % div == 0))
    {
        i /= div;
        lower *= div;
        exp++;
    }
    if (i == 1)
    {
        phi[lower] = ((long)Math.Pow(div, exp - 1)) * (div-1);
        return phi[lower];
    }
    phi[i*lower] = CalcTotient(i, phi, divisors) *
                         CalcTotient(lower, phi, divisors);
    return phi[i*lower];
}

Euler problem # 206

I just solved my 78th problem from Project Euler. The solution was not complicated (just a little mathematical observation) and then brute force CPU.

I was working on Aya‘s new laptop, and I noticed the C# code took 9 seconds to run in “power saver” mode, and only 4 seconds in “high performance” mode. It turns out that Vista caps the CPU speed at 50% for power saver mode. Sweet.

(This might seem very obvious to some, however I haven’t operated a laptop for several years, and so this is new to me – if Windows XP does such tricks, it is far less explicit than Vista).

Don’t be silly, turn Debug off

The number 2 rule of performance testing is “never test in Debug mode” (rule number 1 is “don’t optimize what you don’t need“).

Well, I was reminded today this is not limited to compiling your application in Release mode. A process I’m developing showed about 10 times worse performance than our initial testing showed. I began to think about optimizations and introducing parallelism, when a I stumbled on the real answer:

log4net logging level was on “Debug”. D’oh!

Most annoying Euler problem yet

In Problem 54, you receive a set of poker hands, and are asked to count the number of times Player 1 won. No algorithmic or mathematical intelligence here, just coding. I advise skipping it in favor of more interesting problems.

Some say the Chinese word for “crisis” is composed of “danger” and opportunity (others do not). I took the time to play with LINQ and lambdas for the first time (in C# at least), and it was a great deal of fun – even at my home computer which is currently suffering from lack of Resharper.

SPOILER SPOILER SPOILER

Here’s a bit of LINQ goodness. The method I used (SPOILER SPOILER) to calculate the winner of two similar hands is a weighted sum. I summed all the best cards with a high weight, then added less important cards with lower weight (for example, 3 twos get more weight each than 2 kings in the same hand).

Once I found which value is the best in a given hand (2 in the above hand), I used this little one liner to count the number of cards equal to the maximal value.

_cards.Where(c => c.Value == bestValue).Count();

Terse is the new black.