I'm currently evaluating the MKL 7.2 for windows on a Xeon 3.2 GhZ as part of an image processing application. We are currently using a build of the ATLAS CBLAS libraries, but, in upgrading to .NET this past week, we would have to rebuild the ATLAS libraries to get the debugging symbols correct for vTune. As I had heard we had gotten good results switching to MKL on our cluster, I was hoping that switching to MKL for our Xeon builds would be acceptable. However on our "one minute" benchmark (256x256 matrix) I clocked roughly 69 seconds for MKL, and 66 seconds for ATLAS.
Currently I'm using the mkl.h header with mkl_c.lib (or mkl_c_dll.lib -- in the quick check there was no discernible difference) and developing in Visual Studio .NET 2003. Our applications currently generally use 64, 128, 256, 512 and 1024 square matrices.
A 4% loss seems pretty rough. (For another data run I started before I left, which is very similar to the benchmark, repeated many times, the 3 seconds would add nearly a day of computing time)
At this point, I am seeking advice on how to get the most out of the BLAS portion of the MKL, and wondering what sorts of performance numbers I should expect to see on the various BLAS levels. Also, are there any issues relating to the switch from ATLAS I should be aware of?