• 2019 Update 3
  • 03/07/2019
  • Public Content
  • Download as PDF
Contents

Errors fall into two different categories:
  • Local errors that need only the information available in the process itself and do not require additional communication between processes
  • Global errors that require information from other processes
Another aspect of errors is whether the application can continue after they occurred. Minor problems are reported as warnings and allow the application to continue, but they lead to resource leaks or portability problems. Real errors are invalid operations that can only be skipped to proceed, but this either changes the application semantic (for example, transmission errors) or leads to follow-up errors (for example, skipping an invalid send can lead to a deadlock because of the missing message). Fatal errors cannot be resolved at all and require an application shutdown.
Problems are counted separately per process. Disabled errors are neither reported nor counted, even if they still happen to be detected. The application will be aborted as soon as a certain number of errors are encountered: obviously the first fatal error always requires an abort. Once the number of errors reaches
CHECK-MAX-ERRORS
or the total number of reports (regardless whether they are warnings or errors) reaches
CHECK-MAX-REPORTS
(whatever comes first), the application is aborted. These limits apply to each process separately. Even if one process gets stopped, the other processes are allowed to continue to see whether they run into further errors. The whole application is then aborted after a certain trace period. This timeout can be set through
CHECK-TIMEOUT
.
The default for
CHECK-MAX-ERRORS
is
1
so that the first error already aborts, whereas
CHECK-MAX-REPORTS
is at
100
and thus that many warnings errors are allowed. Setting both values to
0
removes the limits. Setting
CHECK-MAX-REPORTS
to
1
turns the first warning into a reason to abort.
When using an interactive debugger the limits can be set to 0 manually and thus removed, because the user can decide to abort using the normal debugger facilities for application shutdown. If he chooses to continue then Intel® Trace Collector will skip over warnings and non-fatal errors and try to proceed. Fatal errors still force Intel® Trace Collector to abort the application.
See the lists of supported errors (the description provides just a few keywords for each error, a more detailed description can be found in the following sections).
Local Errors
Error Name
Type
Description
LOCAL:EXIT:SIGNAL
Fatal
Process terminated by fatal signal
LOCAL:EXIT:BEFORE_MPI_FINALIZE
Fatal
Process exits without calling
MPI_Finalize()
LOCAL:MPI:CALL_FAILED
Depends on MPI and error
MPI itself or wrapper detects an error
LOCAL:MEMORY:OVERLAP
Warning
Multiple MPI operations are started using the same memory
LOCAL:MEMORY:ILLEGAL_MODIFICATION
Error
Data modified while owned by MPI
LOCAL:MEMORY:INACCESSIBLE
Error
Buffer given to MPI cannot be read or written
LOCAL:MEMORY:ILLEGAL_ACCESS
Error
Read or write access to memory currently owned by MPI
LOCAL:MEMORY:INITIALIZATION
Error
Distributed memory checking
LOCAL:REQUEST:ILLEGAL_CALL
Error
Invalid sequence of calls
LOCAL:REQUEST:NOT_FREED
Warning
Program creates suspiciously high number of requests or exits with pending requests
LOCAL:REQUEST:PREMATURE_FREE
Warning
An active request has been freed
LOCAL:DATATYPE:NOT_FREED
Warning
Program creates high number of data types
LOCAL:BUFFER:INSUFFICIENT_BUFFER
Warning
Not enough space for buffered send
Global Errors
Error Name
Type
Description
GLOBAL:MSG/COLLECTIVE:DATATYPE:MISMATCH
Error
The type signature does not match
GLOBAL:MSG/COLLECTIVE:DATA_TRANSMISSION_CORRUPTED
Error
Data modified during transmission
GLOBAL:MSG:PENDING
Warning
Program terminates with unreceived messages
GLOBAL:DEADLOCK:HARD
Fatal
A cycle of processes waiting for each other
GLOBAL:DEADLOCK:POTENTIAL
Fatal
a
A cycle of processes, one or more in blocking send
GLOBAL:DEADLOCK:NO_PROGRESS
Warning
Warning when application might be stuck
GLOBAL:COLLECTIVE:OPERATION_MISMATCH
Error
Processes enter different collective operations
GLOBAL:COLLECTIVE:SIZE_MISMATCH
Error
More or less data than expected
GLOBAL:COLLECTIVE:REDUCTION_OPERATION_MISMATCH
Error
Reduction operation inconsistent
GLOBAL:COLLECTIVE:ROOT_MISMATCH
Error
Root parameter inconsistent
GLOBAL:COLLECTIVE:INVALID_PARAMETER
Error
Invalid parameter for collective operation
GLOBAL:COLLECTIVE:COMM_FREE_MISMATCH
Warning
MPI_Comm_free()
must be called collectively
a
if check is enabled, otherwise it depends on the MPI implementation

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804