In a simple test program I have measured the performance
of MKL6.0 DGEMM on a Dual Xeon (2.66 GHz, 533FSB) for
different matrix sizes.
When OMP_NUM_THREADS is greater than 1, I encounter
program stalls, i.e. the threads just start sleeping
and do not do any more work. The matrix size for
which this happens differs from run to run with the
Has anybody else seen this effect yet? Any ideas?