This blog has moved to Medium

Subscribe via email


Archive for August 2007

Reminder – Email Subscritpion

Apperantly some people didn’t see this, but you can register to receive email updates on this blog, by clicking here (or the button on the right that says email).

Geni

Geni is a free genealogy website. It has an extremely easy user interface, and allows you to build and maintain a family tree in minutes.

Your tree is shared among your family, who can edit and expand it themselves. A classic Web 2.0 website – too bad I didn’t think of that 🙂

Soul Geek

As I assume a great deal of my readers are geeks, here is a dating website dedicated for geeks.

A New Form of Spam

In the constant evolutionary battle between spam and spam filters, we keep seeing new tricks spammers are using. Today, I got this message (passed my Gmail spam filter).

Update
I tried just copying the text from my mail, but this malformed HTML messed up not only this post, but also other posts on my blog – so instead I’m attaching an image.

Starcraft Board Game

Cool Computer Science Stuff!

First, the simpler one. A really cool algorithm that finds holes of any shape in a given image, and “patches” them from an exiting bank of other images. This might be similar technology to Photosynth.

Second, a historical moment for Computer Science. It appears the first NP-Complete problem has been solved in polynomial time. They use some sort of “optical solution” and not a Turing machine, and the number of photons used is proportional to N^N. I don’t know if this will have deep theoretical implecations (haven’t read the article yet), but it’s interesting (to C.S geeks).

I Don’t Like Google Today

If you remember, when Gmail first appeared, Google promised us “we’ll never have to delete another email“. It’s now all about searching your email, and Google will take care of the storage problem for you.

This sounds reasonable, considering file attachments take up most of the space, and Google can easily detect identical files through a (really big) hash table and store only a single copy of every file.

However, I’m currently at 95% capacity of my alloted storage. It’s not my fault, I’m getting big emails, and not deleting them, just as Google asked. I even tried to search for all my large mails so I can delete them in one swoop, but it’s impossible using the current Gmail interface.

Today, I login and find this message. Gmail is kindly offering me to purchase additional storage. Thank you very much! What’s the matter, AdSense not bringing in enough revenue anymore?

This gives me a really bad taste, especially as Yahoo! are now offering unlimited storage for free, for the last couple of months.

A bad one for you, Gmail.

Dotnet Web Crawler Speedup

I’m writing a web crawler in C#, and getting it to perform well was really annoying.
I tried simply using ThreadPool.QueueUserWorkItem() to queue up my requests to multiple threads. Each thread just ran WebClient.DownloadString().

While the threads did run in parallel, it turned out WebClient had an inherent lock.
I tried messing with the ConnectionManagementSection, but that turned out read-only.
After some Google, I found that the configuration can only be changed by modifying the machine.config or user.config files! Seems pretty stupid to me.

After doing that simply didn’t work either, I found this code that helped me through. I still don’t know exactly why WebClient.DownloadString() doesn’t work, but after some tweaking I got to about 2.5 pages pre second. Still not top speed, but way better than the 0.5 pages/second I started with.