We have a small linux cluster Oscar/CentOS 5.5: a master and 4 nodes. We are computing only on the nodes. The nodes are identical: 2 hexacores X5650 (so 2*6 cores per node). There are 24 Gb of RAM per nodes. All the cluster is connected with infiniband and the driver of open linux fabrics is used. The intel cluster Toolkit is installed on the master and on all the nodes.
- with intel Cluster Toolkit:
-- 2 jobs 2x8 are running on the nodes; so all the nodes are busy with jobs but there are 4 cores free per nodes. there is enough free RAm on all the nodes.
-- I start a 2x2 job: this is a CFD program. I just notice the duration of a time step: ~ 2.0x10^-1 s (very good!)
-- I start the same job but not 2x2, 1x4. I read the duration of time steps: ~ 1.2 s (very veyr bad! 6x slower)
- with openmpi:
-- the same 2x8 jobs are running.
-- I start the same 2x2 job: the duration of a time step: ~ 3.0x10^-1 s (good but intel mpi does better)
-- I start the same 1x4 job. I read the duration of time steps: ~ 3.0x10^-1 s (so much much better thant inte mpi)
I have probably done a configuration error...but I don't find it. Have anyone a idea ? Where can I start to search ?
Thx a lot,