MKL performance on EM64T versus SPL

MKL performance on EM64T versus SPL

serge-sandler's picture

I am porting an application from Intel SPL to MKL 8.0, and was pleased with performance on Pentium4 CPU, the ddot() dot product from MKL has taken around 40% of nspdDotProd() from SPL.

However, running the same 32-bit executable on Windows XP x64 with AMD 64 CPU produced opposite result. MKL performance was 1.5 times slower comparing to SPL. The MKL was using the mkl_def.dll.

Was I doing something wrong? How to explain the performance degradation?

It seems that to take advantage of the 64-bit CPU we will have to port the whole application to the 64-bit platform. Ive had a wrong impression from the MKL home page paragraph on Automatic Runtime Processor Detection, that 32-bit application could take an advantage of the 64-bit CPU when integrated with MKL.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Tim Prince's picture

I would expect mkl_def to use only pentium 2 compatible code, if that, so it should not be surprising that SSE2 code would out-perform it.
I think you're getting mixed up between 64-bittedness and SSE2 support. There is a connection, since 64-bit Windows software is unlikely to miss out on taking at least partial advantage of SSE2.
More recent MKL versions have better AMD64 support.

serge-sandler's picture

Thank you for your reply, Tim18.?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" />

I've installed the latest MKL 8.1.

Running the same dot product test on AMD64, the 32-bit executable uses mkl_p4p.dll (SSE3 is recognised), and 64-bit executable uses mkl_p4n (CPU is recognised as Xeon). The performance results are the same as for the SPL. This is an improvement comparing with MKL 8.0, where the 32-bit executable was 1.5 times slower. However, I'd expect the MKL to outperform SPL. On Intel CPU with SSE2, MKL significantly outperforms SPL.

Interestingly, using mkl_def on AMD64 instead of mkl_p4p.dll (32 bit case) and mkl_p4n (64 bit case) has not produced significantly different results. It seems as MKL has recognised CPU properly, but has not utilised its potential?

Login to leave a comment.