We recently got a 16node cluster with dual processor QC-E5430 Xeon and 16GB RAM/node, all connected with infiniband. I compiled the program we use with ifort 10.1.017, mkl 10.1.014, mvapich2, scalapack, blacs. When doing the performance tests,I noticed that intra-node job distribution is taking more time to complete than inter-node job distribution.
See the table below for some numbers:
My question is: is this result to be expected? And how to increase the performance when 8jobs are assigned to a node i.e., 1job/core? I am using sequential mkl ilbraries.