coherence are separate issues which may occur in parallel computations. These
concepts may be summarized as:
Repeatability: The routine will yield
the exact same result if it run multiple times on an identical problem. Each
process may get a different result than the others (i.e., repeatability does
not imply coherence), but that value will not change if the routine is
invoked multiple times.
Homogeneous coherence: All processes
selected to possess the result will receive the exact same answer if:
Communication does not change the value of the
All processes perform floating point arithmetic
exactly the same.
Heterogeneous coherence: All processes will receive the
exact same answer if communication does not change the value of the
In general, lack of the associative property for floating
point calculations may cause both incoherence and non-repeatability. Algorithms that
rely on redundant computations are at best homogeneous coherent, and algorithms in
which one process broadcasts the result are heterogeneous coherent. Repeatability
does not imply coherence, nor does coherence imply repeatability.
Since these issues do not effect the correctness of the answer,
they can usually be ignored. However, in very specific situations, these issues may
become very important. A stopping criteria should not be based on incoherent
results, for instance. Also, a user creating and debugging a parallel program may
wish to enforce repeatability so the exact same program sequence occurs on every
In the BLACS, coherence and repeatability apply only
in the context of the combine operations. As mentioned above, it is possible to have
communication which is incoherent (for instance, two machines which store floating
point numbers differently may easily produce incoherent communication, since a
number stored on machine A may not have a representation on machine B). However, the
BLACS cannot control this issue. Communication is assumed to be coherent, which for
communication implies that it is also repeatable.
combine operations, the BLACS allow you to set flags indicating that you would like
combines to be repeatable and/or heterogeneous coherent (see
for details on setting these flags).
If the BLACS are instructed to guarantee heterogeneous
coherency, the BLACS restrict the topologies which can be used so that one process
calculates the final result of the combine, and if necessary, broadcasts the answer
to all other processes.
If the BLACS are instructed to
guarantee repeatability, orderings will be enforced in the topologies which are
selected. This may result in loss of performance which can range from negligible to
serious depending on the application.
A couple of
additional notes are in order. Incoherence and nonrepeatability can arise as a
result of floating point errors, as discussed previously. This might lead you to
suspect that integer calculations are always repeatable and coherent, since they
involve exact arithmetic. This is true if overflow is ignored. With overflow taken
into consideration, even integer calculations can display incoherence and
non-repeatability. Therefore, if the repeatability or coherence flags are set, the
BLACS treats integer combines the same as floating point combines in enforcing
repeatability and coherence guards.
By their nature,
maximization and minimization should always be repeatable. In the complex
precisions, however, the real and imaginary parts must be combined in order to
obtain a magnitude value used to do the comparison (this is typically |
| + |
| or sqr(
)). This allows for the possibility of
heterogeneous incoherence. The BLACS therefore restrict which topologies are used
for maximization and minimization in the complex routines when the heterogeneous
coherence flag is set.