I used MKL ScaLAPACK PZHEGVX to introduce fine-grain parallelization into an eigenvalue solver (which is a part of an ab initio software package). When running on a cluster I notice that the performance and CPU utilization became noticeably worse compared to the initial coarse-grain parallelized version of the software which called MKL's ZHPGVX on each node instead of using parallel version of the routine. There is no significant network utilization while in the call to PZHEGVX as far as I can tell from the Task Manager Network page. Does anybody have any hints on the reasons/solutions for the inferior performance of the PZHEGVX?
Thanks in advance for any info.