Intel MPI with IMB-MPI1 on all the nodes produces reg_mr Cannot allocate memory

Intel MPI with IMB-MPI1 on all the nodes produces reg_mr Cannot allocate memory

Guillaume De Nayer's picture

Hi,

we have a little cluster with 8 nodes (each one 12 cores). We have 2 blades. In one blade there are 4 nodes. All these nodes are connected with infiniband.

Intel MPI ist installed and configured with shm:ofa.

I'm starting the following test on all the cores of the cluster:
mpirun -np 96 IMB-MPI1

It generates "normal" results for all the sub-tests. But there is a problem with:
#----------------------------------------------------------------
# Benchmarking Alltoall
# #processes = 96
#----------------------------------------------------------------

it gives:
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
0 1000 0.11 0.15 0.12
1 1000 42.13 42.15 42.14
2 1000 43.61 43.62 43.62
4 1000 52.55 52.57 52.56
8 1000 62.75 62.78 62.77
16 1000 68.49 68.52 68.50
32 1000 80.11 80.13 80.12
64 1000 111.07 111.10 111.09
128 1000 181.19 181.25 181.23
256 1000 368.36 368.52 368.44
512 1000 328.78 328.83 328.80
1024 1000 602.03 603.65 602.17
2048 1000 5873.23 5873.65 5873.45
4096 1000 6000.28 6000.59 6000.43
8192 1000 6965.62 6965.84 6965.75
16384 943 10429.38 10429.66 10429.52
32768 400 25244.62 25245.83 25245.13
65536 223 44969.48 44972.04 44970.70
131072 118 84991.07 84997.68 84994.67
262144 60 167439.02 167466.40 167451.96
524288 31 330707.68 330769.06 330739.70
1048576 16 658785.06 659147.81 658966.23
2097152 8 1314571.62 1315755.52 1315313.50
n08:3914: reg_mr Cannot allocate memory
n08:3914: reg_mr Cannot allocate memory
n08:3915: reg_mr Cannot allocate memory
...

I'm seeing these "reg_mr Cannot allocate memory" for all the nodes...

What is exactly this problem and how can I solve it ?

Thx a lot!
Best regards

5 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Dmitry Kuzmin (Intel)'s picture

Hi Guillaume,

You are probably using Mellanox HCAs. This message usually means that there is not enough memory for buffers. It depends on how much memory you have on a node. Alltoall requires a lot of memory for internal buffers and you just need to limit max size of the messages for IMB.

You can also try the following trick: add the following line to the /etc/modprobe.conf:
options mlx4_core log_mtts_per_seg=5

It should reduce memory consumed by communication functions.

Regards!
Dmitry







Normal
0




false
false
false

RU
X-NONE
X-NONE




























DefSemiHidden="true" DefQFormat="false" DefPriority="99"
LatentStyleCount="267">
UnhideWhenUsed="false" QFormat="true" Name="Normal" />
UnhideWhenUsed="false" QFormat="true" Name="heading 1" />


















UnhideWhenUsed="false" QFormat="true" Name="Title" />

UnhideWhenUsed="false" QFormat="true" Name="Subtitle" />
UnhideWhenUsed="false" QFormat="true" Name="Strong" />
UnhideWhenUsed="false" QFormat="true" Name="Emphasis" />
UnhideWhenUsed="false" Name="Table Grid" />

UnhideWhenUsed="false" QFormat="true" Name="No Spacing" />
UnhideWhenUsed="false" Name="Light Shading" />
UnhideWhenUsed="false" Name="Light List" />
UnhideWhenUsed="false" Name="Light Grid" />
UnhideWhenUsed="false" Name="Medium Shading 1" />
UnhideWhenUsed="false" Name="Medium Shading 2" />
UnhideWhenUsed="false" Name="Medium List 1" />
UnhideWhenUsed="false" Name="Medium List 2" />
UnhideWhenUsed="false" Name="Medium Grid 1" />
UnhideWhenUsed="false" Name="Medium Grid 2" />
UnhideWhenUsed="false" Name="Medium Grid 3" />
UnhideWhenUsed="false" Name="Dark List" />
UnhideWhenUsed="false" Name="Colorful Shading" />
UnhideWhenUsed="false" Name="Colorful List" />
UnhideWhenUsed="false" Name="Colorful Grid" />
UnhideWhenUsed="false" Name="Light Shading Accent 1" />
UnhideWhenUsed="false" Name="Light List Accent 1" />
UnhideWhenUsed="false" Name="Light Grid Accent 1" />
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 1" />
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 1" />
UnhideWhenUsed="false" Name="Medium List 1 Accent 1" />

UnhideWhenUsed="false" QFormat="true" Name="List Paragraph" />
UnhideWhenUsed="false" QFormat="true" Name="Quote" />
UnhideWhenUsed="false" QFormat="true" Name="Intense Quote" />
UnhideWhenUsed="false" Name="Medium List 2 Accent 1" />
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 1" />
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 1" />
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 1" />
UnhideWhenUsed="false" Name="Dark List Accent 1" />
UnhideWhenUsed="false" Name="Colorful Shading Accent 1" />
UnhideWhenUsed="false" Name="Colorful List Accent 1" />
UnhideWhenUsed="false" Name="Colorful Grid Accent 1" />
UnhideWhenUsed="false" Name="Light Shading Accent 2" />
UnhideWhenUsed="false" Name="Light List Accent 2" />
UnhideWhenUsed="false" Name="Light Grid Accent 2" />
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 2" />
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 2" />
UnhideWhenUsed="false" Name="Medium List 1 Accent 2" />
UnhideWhenUsed="false" Name="Medium List 2 Accent 2" />
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 2" />
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 2" />
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 2" />
UnhideWhenUsed="false" Name="Dark List Accent 2" />
UnhideWhenUsed="false" Name="Colorful Shading Accent 2" />
UnhideWhenUsed="false" Name="Colorful List Accent 2" />
UnhideWhenUsed="false" Name="Colorful Grid Accent 2" />
UnhideWhenUsed="false" Name="Light Shading Accent 3" />
UnhideWhenUsed="false" Name="Light List Accent 3" />
UnhideWhenUsed="false" Name="Light Grid Accent 3" />
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 3" />
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 3" />
UnhideWhenUsed="false" Name="Medium List 1 Accent 3" />
UnhideWhenUsed="false" Name="Medium List 2 Accent 3" />
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 3" />
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 3" />
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 3" />
UnhideWhenUsed="false" Name="Dark List Accent 3" />
UnhideWhenUsed="false" Name="Colorful Shading Accent 3" />
UnhideWhenUsed="false" Name="Colorful List Accent 3" />
UnhideWhenUsed="false" Name="Colorful Grid Accent 3" />
UnhideWhenUsed="false" Name="Light Shading Accent 4" />
UnhideWhenUsed="false" Name="Light List Accent 4" />
UnhideWhenUsed="false" Name="Light Grid Accent 4" />
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 4" />
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 4" />
UnhideWhenUsed="false" Name="Medium List 1 Accent 4" />
UnhideWhenUsed="false" Name="Medium List 2 Accent 4" />
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 4" />
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 4" />
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 4" />
UnhideWhenUsed="false" Name="Dark List Accent 4" />
UnhideWhenUsed="false" Name="Colorful Shading Accent 4" />
UnhideWhenUsed="false" Name="Colorful List Accent 4" />
UnhideWhenUsed="false" Name="Colorful Grid Accent 4" />
UnhideWhenUsed="false" Name="Light Shading Accent 5" />
UnhideWhenUsed="false" Name="Light List Accent 5" />
UnhideWhenUsed="false" Name="Light Grid Accent 5" />
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 5" />
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 5" />
UnhideWhenUsed="false" Name="Medium List 1 Accent 5" />
UnhideWhenUsed="false" Name="Medium List 2 Accent 5" />
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 5" />
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 5" />
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 5" />
UnhideWhenUsed="false" Name="Dark List Accent 5" />
UnhideWhenUsed="false" Name="Colorful Shading Accent 5" />
UnhideWhenUsed="false" Name="Colorful List Accent 5" />
UnhideWhenUsed="false" Name="Colorful Grid Accent 5" />
UnhideWhenUsed="false" Name="Light Shading Accent 6" />
UnhideWhenUsed="false" Name="Light List Accent 6" />
UnhideWhenUsed="false" Name="Light Grid Accent 6" />
UnhideWhenUsed="false" Name="Medium Shading 1 Accent 6" />
UnhideWhenUsed="false" Name="Medium Shading 2 Accent 6" />
UnhideWhenUsed="false" Name="Medium List 1 Accent 6" />
UnhideWhenUsed="false" Name="Medium List 2 Accent 6" />
UnhideWhenUsed="false" Name="Medium Grid 1 Accent 6" />
UnhideWhenUsed="false" Name="Medium Grid 2 Accent 6" />
UnhideWhenUsed="false" Name="Medium Grid 3 Accent 6" />
UnhideWhenUsed="false" Name="Dark List Accent 6" />
UnhideWhenUsed="false" Name="Colorful Shading Accent 6" />
UnhideWhenUsed="false" Name="Colorful List Accent 6" />
UnhideWhenUsed="false" Name="Colorful Grid Accent 6" />
UnhideWhenUsed="false" QFormat="true" Name="Subtle Emphasis" />
UnhideWhenUsed="false" QFormat="true" Name="Intense Emphasis" />
UnhideWhenUsed="false" QFormat="true" Name="Subtle Reference" />
UnhideWhenUsed="false" QFormat="true" Name="Intense Reference" />
UnhideWhenUsed="false" QFormat="true" Name="Book Title" />



/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-priority:99;
mso-style-qformat:yes;
mso-style-parent:"";
mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
mso-para-margin:0cm;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:11.0pt;
font-family:"Calibri","sans-serif";
mso-ascii-font-family:Calibri;
mso-ascii-theme-font:minor-latin;
mso-fareast-font-family:"Times New Roman";
mso-fareast-theme-font:minor-fareast;
mso-hansi-font-family:Calibri;
mso-hansi-theme-font:minor-latin;}
-->

add the following line to /etc/modprobe.conf:

options mlx4_core log_mtts_per_seg=5

Guillaume De Nayer's picture

Hi!

Thx for your useful answer. I will try your ideas! But where can I find how limit the size of the message for IMB. I had the idea, but I couldn't find how...I'm too stupid to google correctly...

Best regards!
Guillaume

andres-more (Intel)'s picture
You need to provide a file with the explicit list of message lengths to include. I think the default behavior is to include all of them if no file is provided.
$ ./IMB-MPI1 -h ... - msglen the argument after -msglen is a lengths_file, an ASCII file, containing any set of nonnegative message lengths, 1 per line ...
For instance, Intel Cluster Checker use the following list of msglen values to get a quick but still representative sample of results.
$ cat IMB_msglen 0 1 2 4 4194304
Note that you usually get best latency with a zero payload, and the best bandwidth with a really big payload. As usual a would recommend some experimentation to optimize those values.
Guillaume De Nayer's picture

Hi!

Great! thx a lot!

Login to leave a comment.