Developer Reference

  • 0.10
  • 10/21/2020
  • Public Content
Contents

BLACS Point To Point Communication

This
topic
describes BLACS routines for point to point communication.
Point to point communication requires two complementary operations. The send operation produces a message that is then consumed by the receive operation. These operations have various resources associated with them. The main such resource is the buffer that holds the data to be sent or serves as the area where the incoming data is to be received. The level of blocking indicates what correlation the return from a send/receive operation has with the availability of these resources and with the status of message.

Non-blocking

The return from the send or receive operations does not imply that the resources may be reused, that the message has been sent/received or that the complementary operation has been called. Return means only that the send/receive has been started, and will be completed at some later date. Polling is required to determine when the operation has finished.
In non-blocking message passing, the concept of communication/computation overlap (abbreviated C/C overlap) is important. If a system possesses C/C overlap, independent computation can occur at the same time as communication. That means a nonblocking operation can be posted, and unrelated work can be done while the message is sent/received in parallel. If C/C overlap is not present, after returning from the routine call, computation will be interrupted at some later date when the message is actually sent or received.

Locally-blocking

Return from the send or receive operations indicates that the resources may be reused. However, since this only depends on local information, it is unknown whether the complementary operation has been called. There are no locally-blocking receives: the send must be completed before the receive buffer is available for re-use.
If a receive has not been posted at the time a locally-blocking send is issued, buffering will be required to avoid losing the message. Buffering can be done on the sending process, the receiving process, or not done at all, losing the message.

Globally-blocking

Return from a globally-blocking procedure indicates that the operation resources may be reused, and that complement of the operation has at least been posted. Since the receive has been posted, there is no buffering required for globally-blocking sends: the message is always sent directly into the user's receive buffer.
Almost all processors support non-blocking communication, as well as some other level of blocking sends. What level of blocking the send possesses varies between platforms. For instance, the Intel® processors support locally-blocking sends, with buffering done on the receiving process. This is a very important distinction, because codes written assuming locally-blocking sends will hang on platforms with globally-blocking sends. Below is a simple example of how this can occur:
IAM = MY_PROCESS_ID() IF (IAM .EQ. 0) THEN    SEND TO PROCESS 1    RECV FROM PROCESS 1 ELSE IF (IAM .EQ. 1) THEN    SEND TO PROCESS 0    RECV FROM PROCESS 0 END IF
If the send is globally-blocking, process 0 enters the send, and waits for process 1 to start its receive before continuing. In the meantime, process 1 starts to send to 0, and waits for 0 to receive before continuing. Both processes are now waiting on each other, and the program will never continue.
The solution for this case is obvious. One of the processes simply reverses the order of its communication calls and the hang is avoided. However, when the communication is not just between two processes, but rather involves a hierarchy of processes, determining how to avoid this kind of difficulty can become problematic.
For this reason, it was decided the BLACS would support locally-blocking sends. On systems natively supporting globally-blocking sends, non-blocking sends coupled with buffering is used to simulate locally-blocking sends. The BLACS support globally-blocking receives.
In addition, the BLACS specify that point to point messages between two given processes will be strictly ordered. If process 0 sends three messages (label them
A
,
B
, and
C
) to process 1, process 1 must receive
A
before it can receive
B
, and message
C
can be received only after both
A
and
B
. The main reason for this restriction is that it allows for the computation of message identifiers.
Note, however, that messages from different processes are not ordered. If processes 0, . . ., 3 send messages
A
, . . .,
D
to process 4, process 4 may receive these messages in any order that is convenient.

Convention

The convention used in the communication routine names follows the template
?xxyy2d
, where the letter in the
?
position indicates the data type being sent,
xx
is replaced to indicate the shape of the matrix, and the
yy
positions are used to indicate the type of communication to perform:
i
integer
s
single precision real
d
double precision real
c
single precision complex
z
double precision complex
ge
The data to be communicated is stored in a general rectangular matrix.
tr
The data to be communicated is stored in a trapezoidal matrix.
sd
Send. One process sends to another.
rv
Receive. One process receives from another.
BLACS Point To Point Communication
Routine name
Operation performed
Take the indicated matrix and send it to the destination process.
Receive a message from the process into the matrix.
As a simple example, the pseudo code given above is rewritten below in terms of the BLACS. It is further specifed that the data being exchanged is the double precision vector
X
, which is 5 elements long.
CALL GRIDINFO(NPROW, NPCOL, MYPROW, MYPCOL) IF (MYPROW.EQ.0 .AND. MYPCOL.EQ.0) THEN    CALL DGESD2D(5, 1, X, 5, 1, 0)    CALL DGERV2D(5, 1, X, 5, 1, 0) ELSE IF (MYPROW.EQ.1 .AND. MYPCOL.EQ.0) THEN    CALL DGESD2D(5, 1, X, 5, 0, 0)    CALL DGERV2D(5, 1, X, 5, 0, 0) END IF

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804