Hi,
we have a little cluster with 8 nodes (each one 12 cores). We have 2 blades. In one blade there are 4 nodes. All these nodes are connected with infiniband.
Intel MPI ist installed and configured with shm:ofa.
I'm starting the following test on all the cores of the cluster:
mpirun -np 96 IMB-MPI1
It generates "normal" results for all the sub-tests. But there is a problem with:
#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 96
#----------------------------------------------------------------
it gives:
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.11 0.15 0.12
1 1000 42.13 42.15 42.14
2 1000 43.61 43.62 43.62
4 1000 52.55 52.57 52.56
8 1000 62.75 62.78 62.77
16 1000 68.49 68.52 68.50
32 1000 80.11 80.13 80.12
64 1000 111.07 111.10 111.09
128 1000 181.19 181.25 181.23
256 1000 368.36 368.52 368.44
512 1000 328.78 328.83 328.80
1024 1000 602.03 603.65 602.17
2048 1000 5873.23 5873.65 5873.45
4096 1000 6000.28 6000.59 6000.43
8192 1000 6965.62 6965.84 6965.75
16384 943 10429.38 10429.66 10429.52
32768 400 25244.62 25245.83 25245.13
65536 223 44969.48 44972.04 44970.70
131072 118 84991.07 84997.68 84994.67
262144 60 167439.02 167466.40 167451.96
524288 31 330707.68 330769.06 330739.70
1048576 16 658785.06 659147.81 658966.23
2097152 8 1314571.62 1315755.52 1315313.50
n08:3914: reg_mr Cannot allocate memory
n08:3914: reg_mr Cannot allocate memory
n08:3915: reg_mr Cannot allocate memory
...
I'm seeing these "reg_mr Cannot allocate memory" for all the nodes...
What is exactly this problem and how can I solve it ?
Thx a lot!
Best regards



