MPI Broadcast lossless?

MPI Broadcast lossless?


I will like to know if broadcast feature in MPI is 100%lossless. Do we need to handle cases where it is not lossless.


10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi jimmy,

Are you referring to the new fault-tolerant behavior in Intel MPI Library 4.0? The FT support is limited and does not cover collectives. Feel free to check out the Reference Manual for details on provided functionality.


I am referring not referring to fault tolerance. I need to check when doing a broadcast, is it guarnteed that all receipents will receive the broadcast data.

Intel MPI Library works according to MPI standard: completion
of MPI_Bcast on the root only says that it complete sending, completion on
non-root guarantee that we got all data (without loses). There are no
additional confirmations to root.


Is it possible to force MPI_Bcast to perform a multi-cast?

I realise the time to broadcast 200MB of data increases signficant with increased nodes.
Say if I were to broadcast to 1 node, it takes < 2s to complete.
Broadcast to 2 nodes takes <6s to complete.
Broadcast to 3 nodes takes <10s to complete.

Forcing a multicast will allow me to complete transmitting to all nodes in the shortest time.

All collective operations (at the final stage) are implemented as pt2pt communication using different algorithms. To implement multi-cast it should be supported in hardware.

There are 2 options you could try:
1. Play with different algorithms using I_MPI_ADJUST_BCAST environment variable - see Reference Manual.
2. OFA Fabric in the Intel MPI Library supports multi-rail feature. Set I_MPI_FABRICS=shm:ofa, I_MPI_OFA_NUM_ADAPTERS= e.g. 2
(1 by default), I_MPI_OFA_NUM_PORTS=. If your nodes have more than 1 interconnect (or multi-port interconnects) you can try this feature.


How do you invoke these? I tried but was unsuccessful. Using a configuration file and invoked with mpiexec.exe .


Are you working on Windows?
OFA module is not supported on Windows platform! Sorry. And do not expect it in the nearest future. It means that you cannot use multi-rail feature either.


I am referring to the I_MPI_ADJUST_BCAST parameter. How do I set this parameter? Can I have some examples?


>I am referring to the I_MPI_ADJUST_BCAST parameter. How do I set this parameter? Can I have some examples?

Yeah, sure:
-genv I_MPI_ADJUST_BCAST '1:4-16;2:17-128;3:129-4096;7:4097-4000000'
Means that alrorithm 1 (Binominal) will be used for message from 4 to 16 bytes long, algorithm 2 (Recoursive doubling) for messages from 17 to 128 bytes long, algorithm 3 (Ring) for messages from 129 to 4K bytes long, algorithm 7 (Shumilin's) for large messages.

BTW: Intel MPI library doesn't support multi-cast communication.


Leave a Comment

Please sign in to add a comment. Not a member? Join today