MKL and CPU auto dispatch

MKL and CPU auto dispatch

I have two questions:

1. Is there an easyway to *print* what cpucode is dispatched actually? Something like
=====
fprintf(myfile, "%s
", CPU_NAME[Get_MKL_cpuid()]);
=====
I know this is a silly question because MKL will do its best and P4 will be detected as P4 in any case, but users of my program want to *see* whether SSE/SSE2 was actually used.

2. Is AMD64 (opteron or Athlon64) SSE2 detected by auto-dispatch? Or, is it treated as "generic" and no SSE2 is used?

If AMD64-SSE2 is not detected, then is there a way to enforce it? (through environmental variable or whatever)

Thank you!

4 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

1. Take a look at the "Obtaining Version Information" section of the Technical User Notes (mkluse.htm) in your doc directory. There are functions that give some information on what processor has been detected.

2. Intel MKLversion 7.2.1will run code that provides good performance on AMD using a variety of techniques which may include use of SSE andSSE-2 instruction sets and other architecture features.You should find thatIntel MKL performanceis comparable to that of alternative libraries on AMD.

Regards,
Todd

Version 7.2.1 has just been released. If you have support services you should be getting a notice shortly. The web page is not always updated(e.g., it has not been updated to show 7.2.1 versus 7.2), but the eval link does contain the latest package.

-Todd

Well, there are emperical ways of finding out if the SIMD instructions are being used. For instance, you could run dgemm on a problem size of, say, 1000x1000x1000. If you see performance that exceeds the clock frequency in flops the SSE instructions are in use. So, for instance if you run a dgemm timing exercise on a 3.6 GHz Intel processor and you get approxiamely 6.5 GFLOPS, this can be accomplished only with the use of the SSE instructions in which two arithmetic operations are performed per clock. If X87 code were used the performnace would be less that 3.6 GFLOPS on dgemm.

On your second question, starting with MKL 7.2.1, which is now available, MKL will dispatch code using SSE2 instructions on AMD processors.

Bruce

Leave a Comment

Please sign in to add a comment. Not a member? Join today