dsygvx scales badly on Itanium2

dsygvx scales badly on Itanium2

Ritratto di schorscherl

Hi,

we tried to solve a 4000x4000 dense generalized eigenvalue problem
using dsygvx from MKL. On an SGI Altix machine performance is good
with one CPU, but scalability is really bad (approx. 10-20% speedup
with 2 CPUs, slowdown with more).

Is this a fundamental problem with the algorithm that dsygvx
uses or is the routine just not well parallelized?

Bye,
Georg.

3 post / 0 new
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione
Ritratto di Tim Prince

I can't answer specifically for this function. In my limited experience with OpenMP on the Altix, it'seven moredifficult to get good scaling to more than 2 CPUs than oncertain other platforms. Among the probable reasons is there is no special accommodation for the increased latency in the NUMA system for data transfers outside a pair of CPUs. Some of the responsibility for allocating thread local storage effectively rests with the threading libraries provided by the OS, so is outside thecontrol ofthe libraries which come with MKL.


Many MKL functions do scale much better than that on 2 CPUs. This problem likely is not as trivial to parallelize as some of the more popular ones, and may not have benefitted from as much development effort.


Ritratto di TODD R. (Intel)

From the technical user notes, it appears that his function has not been threaded.

http://www.intel.com/software/products/mkl/docs/mklusel.htm#Using%20MKL%20Parallelism

So any scaling you see would have to come from calls to threaded level 3 BLAS functions

-Todd

Accedere per lasciare un commento.