dgemm performance

dgemm performance

We are seeing some strange performance with the dgemm operator, which seems to depend upon the content of the source matrices. Content, not size. It is very strange, as if we use the same matrix sizes with randomly generated data (uniform or normal) or a constant value then everything appears fine and the timings are relatively consistent. The data doesn't seem too strange:

Max: 0.0997145, Min: -0.3362, Avg: -3.5246e-006

Most of the values hover close to that average, but with a few spikes clustered mostly in one area. What I can't understand is why that would have any effect on the performance, no matter what the values were. It is just a matrix multiplication, right?

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

If you have multiplication producing results small enough to incur underflow ( less than 1e-308), that could take a great deal of time, particularly on Itanium. In that case, for SSE2 or Itanium code, setting "flush to zero" (abrupt underflow) should help, if you haven't already done so. For 64-bit Windows, the OS will do that, unless you have the wrong beta version. Otherwise, you would invoke it by setting the -Qftz or -ftz option when you compile the main program with Intel compilers (implied by -O3 in some cases), or by using the
_MM_SET_FLUSH_ZERO_MODE (_MM_FLUSH_ZERO_ON);
facility defined for C/C++ in

So, you can see my answer would have been more to the point if you had given more information.

Thanks! That does appear to be the source of the problem. You have my eternal gratitude.

Leave a Comment

Please sign in to add a comment. Not a member? Join today