describes BLACS routines for point to point communication.
Point to point communication requires two complementary operations. The send
operation produces a message that is then consumed by the receive
operation. These operations have various resources associated with them. The main such resource is the buffer that holds the data to be sent or serves as the area where the incoming data is to be received. The level of blocking
indicates what correlation the return from a send/receive operation has with the availability of these resources and with the status of message.
The return from the send
operations does not imply that the resources may be reused, that the message has been sent/received or that the complementary operation has been called. Return means only that the send/receive has been started, and will be completed at some later date. Polling is required to determine when the operation has finished.
In non-blocking message passing, the concept of communication/computation overlap
(abbreviated C/C overlap) is important. If a system possesses C/C overlap, independent
computation can occur at the same time as communication. That means a nonblocking operation
can be posted, and unrelated work can be done while the message is sent/received in parallel.
If C/C overlap is not present, after returning from the routine call, computation will be
interrupted at some later date when the message is actually sent or received.
Return from the send
indicates that the resources may be
reused. However, since this only depends on local information, it is unknown whether the
complementary operation has been called. There are no locally-blocking receives: the send
must be completed before the receive buffer is available for re-use.
If a receive has not been posted at the time a locally-blocking send is issued, buffering
will be required to avoid losing the message. Buffering can be done on the sending process,
the receiving process, or not done at all, losing the message.
Return from a globally-blocking procedure indicates that the operation
resources may be reused, and that complement of the operation has at least been
posted. Since the receive has been posted, there is no buffering required for globally-blocking
sends: the message is always sent directly into the user's receive buffer.
Almost all processors support non-blocking communication, as well as some other level
of blocking sends. What level of blocking the send possesses varies between platforms.
For instance, the Intel® processors support locally-blocking sends, with buffering done on the
receiving process. This is a very important distinction, because codes written assuming locally-blocking
sends will hang on platforms with globally-blocking sends. Below is a simple example
of how this can occur:
IAM = MY_PROCESS_ID()
IF (IAM .EQ. 0) THEN
SEND TO PROCESS 1
RECV FROM PROCESS 1
ELSE IF (IAM .EQ. 1) THEN
SEND TO PROCESS 0
RECV FROM PROCESS 0
If the send is globally-blocking, process 0 enters the send, and waits for process 1 to
start its receive before continuing. In the meantime, process 1 starts to send to 0, and waits for 0 to receive before continuing. Both processes are now waiting on each
other, and the program will never continue.
The solution for this case is obvious. One of the processes simply reverses the order of
its communication calls and the hang is avoided. However, when the communication is not
just between two processes, but rather involves a hierarchy of processes, determining how
to avoid this kind of difficulty can become problematic.
For this reason, it was decided the BLACS would support locally-blocking sends. On systems
natively supporting globally-blocking sends, non-blocking sends coupled with buffering
is used to simulate locally-blocking sends. The BLACS
support globally-blocking receives.
In addition, the BLACS specify that point to point messages between two given processes
will be strictly ordered. If process 0 sends three messages (label them
) to process 1, process 1 must receive
before it can receive
, and message
received only after both
. The main reason for this restriction is that it allows for
the computation of message identifiers.
Note, however, that messages from different processes are not ordered.
If processes 0, . . ., 3 send messages
, . . .,
to process 4, process 4 may receive
these messages in any order that is convenient.
The convention used in the communication routine names follows the template
, where the letter in
position indicates the data type being sent,
is replaced to indicate the shape of the
matrix, and the
positions are used to indicate the type of communication to perform:
The data to be communicated is stored in a general
The data to be communicated is stored in a
Send. One process sends to another.
Receive. One process receives from another.
BLACS Point To Point Communication
Take the indicated matrix and send it to the destination process.
Receive a message from the process into the matrix.
As a simple example, the pseudo code given above is rewritten below in terms of the
BLACS. It is further specifed that the data being exchanged is the double precision vector
, which is 5 elements long.
CALL GRIDINFO(NPROW, NPCOL, MYPROW, MYPCOL)
IF (MYPROW.EQ.0 .AND. MYPCOL.EQ.0) THEN
CALL DGESD2D(5, 1, X, 5, 1, 0)
CALL DGERV2D(5, 1, X, 5, 1, 0)
ELSE IF (MYPROW.EQ.1 .AND. MYPCOL.EQ.0) THEN
CALL DGESD2D(5, 1, X, 5, 0, 0)
CALL DGERV2D(5, 1, X, 5, 0, 0)