The Zibings Starts Here

28 September, 2009

Updates!

I’m feeling particularly chatty tonight, so I figured I’d give some Zibings updates to those of you who still read this blog.

The Zibings Network
We are well into the rewrite now and are hoping to be testing a new version of the site towards the end of October.  Whether or not this happens will depend a lot on me, but we’re putting as much time into the project as we can afford.  It has become a very large part of the direction we’re looking to take Zibings in general, so the lot of us are doing different things to make the re-launch smooth and worthwhile.

N2 Framework
Ah yes, the framework we love to use so much we can barely spend enough time to do updates!  In a way, we’re really happy that we’ve gotten so busy using it that we haven’t been able to finish doing the v0.2 release, but in most ways we’re just annoyed.  The release is coming still, but I’m learning that setting dates for these isn’t going to be easy while we’re the only main contributors to the code.  Even so, I keep putting some hours into it each week and will hope to release Yverdon v0.2 before the end of the year.

Super Tech Help
The forums are back, and about as active as they ever were.  No real plans have been put forth JUST yet to figure out ways to draw new traffic to the boards, but a meeting is in the works for that exact thing.  I’ll be reviving my PHP tutorial, finishing it eventually, and moving on to another tutorial which I hope won’t take me 3-4 years to complete.

 

All in all, we’re busy-busy here and loving every minute of it, as usual.  I’ll be sure to let you know when things are getting closer to completion.

 

- Andy

Labels: , , , , , , , ,

21 June, 2009

Well That’s Interesting…

For a while now, I’ve been meaning to start dabbling with regular expressions in C#.  I’ve held off though, mostly because I just haven’t had the chance to really get into anything with C# in depth on top of work and family.  At this end of this past week, Zibings finally got started on a very long-awaited project which will be done in ASP.NET, and in starting I found myself needing to port my old (almost 11 years old) email validation algorithm from C++ to C#.  I ported this to PHP years ago, and it works beautifully there, but porting it to C# offered me the ability to tweak the algorithm to work with some new tools available in .NET.

I went through, and first did an almost exact port of the script.  It worked, but I thought I should check it’s performance.  It seemed to be slow, so I tried a version of it using C#’s List datatype.  This seemed faster, but I thought I could do better.  I asked for a bit of help from some nice people on FreeNode’s ##asp.net channel (specifically Kim^J) and was given a pretty blazingly fast regular expression version.

Even so, I felt that it was odd you couldn’t create something that didn’t use a complex system like regular expressions which outperformed regular expressions, so I went back and started tweaking my List and Manual versions of the validation routine.  After a lot of work, I actually have made both into something consistently faster (on average) than a comparatively accurate compiled regular expression.  Before I go further, here’s a sampling of the rather consistent results:

Benchmarks

Also, you can view the entire source code of the .cs file here.

The three ‘algorithms’ are each enclosed in their own class.  The only difference between the List and Manual classes is that instead of using a List collection to store and search acceptable characters the Manual class simply traverses an array of the characters.  Otherwise, they should be identical in the logical patterns they use to verify that the email address and domain name are valid.

The regular expression is almost entirely based off of one readily available at this site, so if anyone out there has a pattern they know to be better/faster that should be tried I would love to hear about it, I am not a RegExpert in the least.

The above image was the result of the source code I’ve uploaded, and are derived from running three emails through a test 500 times, taking the average, and then running through again.  All told, the above test did validation of 15,000 email addresses (but of course they were the same 3 addresses).  I have run the test with a few as 5 attempts and as many as 50,000 attempts.  Regardless of the number of attempts or how many times I tell it to run the test, the order is always the same.  First and fastest is always the List version, second and mostly consistent is the Manual version and the Regex version ends up in last place by various margins.

I’ve always known it to be general knowledge that doing things by hand are faster from a computer’s perspective.  Interestingly enough though, this actually proves that at least in C#, that’s not always true.  The List approach uses a supplied method, the Contains() method, to search for the existence of a character within the List instead of looping through the entire list and bailing out when the first match is found (as the Manual approach basically does).

It should also be noted that just because the computer has an easier time handling the List/Manual methods, doesn’t mean it’s necessarily faster.  Most people are not going to be trying to validate 50,000 email addresses in a few seconds regardless of what they’re doing, so the time that I took writing this algorithm all those years ago (and today) were really wasted in a sense, as it would take a very long time to make up the time in saved milliseconds.  Regardless, I had a lot of fun working with Kim^J to look into the possibilities here.

If anyone finds anything that could help any of the algorithms become faster (and remain accurate), I’d be really excited to hear about your ideas.  Thanks again to Kim^J for the help with the regular expression version and with the test code.

 

- Andy

Labels: , , , , , , ,