we tried to solve a 4000x4000 dense generalized eigenvalue problem
using dsygvx from MKL. On an SGI Altix machine performance is good
with one CPU, but scalability is really bad (approx. 10-20% speedup
with 2 CPUs, slowdown with more).
Is this a fundamental problem with the algorithm that dsygvx
uses or is the routine just not well parallelized?
dsygvx scales badly on Itanium2