I'm using a Pentium D (with dual core and hyperthreading activated). I'm testing/evaluating the MKL 8.0 performance with a C++ application (which is using opencv and ipp 5.0 beta) on Windows XP.
Previously I was using clapack compiled in release mode using visual studio 2005 beta. I replaced that library with the mkl using the ia32 libraries. This conversion was simly changing the library includes for the linker.
Note: I tried using the emt64 versions but I was getting an error during some C++ initialization code in the microsoft libraries (i.e. creating the application context before calling my applications main). This I need to investigate futher since the Pentium D supports emt64. Perhaps I need to change some compiler option.
What I found running this is that the clapack is performing faster the the mkl library. For the mkl my application takes approximately 8000 seconds, while for clapack it is taking 6500 seconds, for a difference 1500 seconds (or 25 minutes). This seems to be a significant performance penalty for a optimized library.
The only answer that I can find to explain this difference is that the clapack is using imprecise floating point while mkl is using precise floating point.
I have reviewed the mkl documentation and it says that I don't have to define any environmental variables (for windows), that the number of threads will be determined automatically.
So currently I'm hard pressed to justify the cost of MKL based on this, i.e. spend money to get less performance.
Can someone identify how to improve performance with MKL.
For reference I'm using the dll version of mkl, since opencv is using libguide40. I think there should only slight difference between using a dll vs static library.
Is it possible that mkl is selecting the wrong processor type and supported features (i.e. not using sse3). How would I verify this?