I am working on optimization of an iterative SPMD HPC application, could anyone point me towards any performance evaluation studies on this for Intel multicore processors (preferably sandybridge).
I'm using openmp and was surprised by the size of the slowdown ~34% from sequential to 1-thread parallel for my application which is using parallel sections around a dynamically scheduled for loop. The machine was "quiet" with no load other than a login/Xwindows running. I'm using intel tools.
Specifically, will more onchip resources (internal registers. load/store queues) become available to each thread executing on a single core if HT is disabled
I'm principally interested in optimizing single application performance with one thread per physical core.