# Friday, December 15, 2006
« Digital Rights Management | Main | What is a Regional Director »

I just read an article on MSDN by Joe Duffy from the CLR team at Microsoft. It goes over parrallelism and how the CLR handles it. I got the pointer to the article from SQL Server Central so I figured it would deal with SQL CLR. It really talks about parrallelism in general and some of the different things to look out for when trying to speed up your applications. It is a really good read on some of the things to consider when trying to make sure your code takes advantage of the new architecture and gets top performance.

 

If you have ever heard me talk about threading or asynchronous calls in .NET you have probably heard my favorite story of slowing down code using parrallelsim. For those of you who have not heard the story here it is:

 

In the late 90s I was working with a client and we were replacing an accounts recievable system written in Clipper with one written using Sybase SQL Server and PowerBuilder. We had told them that due to the architecture it would probably be a slower product but more stable. They seemed to agree with us that it was a necessary trade off but the requirements specification still said something like “the system must be fast”. In talking to the IT department they told us not to worry about that requirement since we couldn’t measure “fast” and they were aware that it would be slower.

We finished the project and got IT to buy off on it. I went out to the client site and installed it on the Director of Finance’s computer. The IT people, the CFO, and some other really important people were in the room along with my bosses. The director was running the program and before he started it up he opened his top desk drawer and took out a stopwatch. I had a sinking feeling. He started the stopwatch when he started looking up an over due account. It took about 9 seconds to retireve the data. I was thinking it wasn’t all that bad. He refused to sign off on the project since his idea of fast was 3 seconds. We all tried to tell him that 3 seconds was not reasonable and that it would be fine but he stuck by his decision. The reasoning went that the system was used by people calling someone and saying they were late in their payments. The person on the other end of the phone would immediately be in a bad mood and any delay or lag in the conversation would only make their mood worse so it had to be 3 seconds since that is not an uncomfortably long time to pause.

We went back to the office dejected and started profileing the application, the network, the database, anything we could think of to make the application faster. We got it down to 5 seconds and went back to the director. He was impressed but still refused to sign off since it was 2 seconds too slow. We were thinking we were not going to get paid for an awful lot of work.

This was in the days before hyper threading, multiple CPU machines, and dual cores so we really had limited options for running things in parrallel. One of the guys on the team got the idea to change the form start up code so instead of going out to the database and retrieving the data it would just put messages into the Windows message loop that would call other methods to retrieve the data and then display it. It was the first asynchronous programming I ever did and looking back on it now I realize it was really bad. We didn’t handle any synchronization issues on the main thread and also didn’t take into account any caching that the OS might be doing.

To make a long story short we took our new version out to the director. He got out his stopwatch and again clicked the button. The window appeared immediately and he turned around to congratulate us. As he was talking I watched over his shoulder as data started popping up at various locations on the screen. It was mostly top left to bottom right in order but wasn’t always guaranteed to show up that way. The director was busy talking to us and didn’t notice it so he decided to sign off on the project on the spot.

Later we ran some tests with the stopwatch. Our efforts at asynchronous programming had “sped up” the application from the original 9 seconds and the optimzed 5 seconds to a very fast 12 seconds. We figured out that the overhead of all the messaging, opening multiple database connections, and painting on the screen had caused the slow down. By running on separate threads each one had to open a connection to the database and it wasn’t being pooled so that was the biggest slow down.

 

The morals that I learned from that experience and many others (and that I seem to have to relearn on a daily basis) are:

  1. Correctness is much more important than speed. If you get the wrong data blazingly fast it will make the customers more angry than having to wait a few extra seconds.
  2.  Don’t optimize until you have completed the code and can measure it. You may suspect that a portion of the code will be slow but until you have some solid numbers you don’t know for sure and you could be making changes to code that hardly ever runs or worse making things go from 5 seconds to 12.
  3. Running code in parallel is difficult. Whether you use an asnychronous pattern or try to write it yourself using primitives it is hard to understand and harder to get right the first time.
  4. Debugging parallel code is even more difficult. By its very nature errors are transient and difficult to find. I prefer to use the thread pool or background worker component whenever possible because some really smart people at Microsoft have figured out how to do it correctly and there have been many more thousands of hours spent debugging that code than I want to spend debugging my algorithms. If your particular work load will allow it you might also be able to use the features of Enterpise Services (COM+) to get a degree of parallelism without having to do a lot of extra coding work.
Comments are closed.