Supported Errors

Errors fall into two different categories:

  • Local errors that need only the information available in the process itself and do not require additional communication between processes

  • Global errors that require information from other processes

Another aspect of errors is whether the application can continue after they occurred. Minor problems are reported as warnings and allow the application to continue, but they lead to resource leaks or portability problems. Real errors are invalid operations that can only be skipped to proceed, but this either changes the application semantic (for example, transmission errors) or leads to follow-up errors (for example, skipping an invalid send can lead to a deadlock because of the missing message). Fatal errors cannot be resolved at all and require an application shutdown.

Problems are counted separately per process. Disabled errors are neither reported nor counted, even if they still happen to be detected. The application will be aborted as soon as a certain number of errors are encountered: obviously the first fatal error always requires an abort. Once the number of errors reaches CHECK-MAX-ERRORS or the total number of reports (regardless whether they are warnings or errors) reaches CHECK-MAX-REPORTS (whatever comes first), the application is aborted. These limits apply to each process separately. Even if one process gets stopped, the other processes are allowed to continue to see whether they run into further errors. The whole application is then aborted after a certain trace period. This timeout can be set through CHECK-TIMEOUT.

The default for CHECK-MAX-ERRORS is 1 so that the first error already aborts, whereas CHECK-MAX-REPORTS is at 100 and thus that many warnings errors are allowed. Setting both values to 0 removes the limits. Setting CHECK-MAX-REPORTS to 1 turns the first warning into a reason to abort.

When using an interactive debugger the limits can be set to 0 manually and thus removed, because the user can decide to abort using the normal debugger facilities for application shutdown. If he chooses to continue then Intel® Trace Collector will skip over warnings and non-fatal errors and try to proceed. Fatal errors still force Intel® Trace Collector to abort the application.

See the lists of supported errors (the description provides just a few keywords for each error, a more detailed description can be found in the following sections).

Local Errors

Error Name

Type

Description

LOCAL:EXIT:SIGNAL

Fatal

Process terminated by fatal signal

LOCAL:EXIT:BEFORE_MPI_FINALIZE

Fatal

Process exits without calling MPI_Finalize()

LOCAL:MPI:CALL_FAILED

Depends on MPI and error

MPI itself or wrapper detects an error

LOCAL:MEMORY:OVERLAP

Warning

Multiple MPI operations are started using the same memory

LOCAL:MEMORY:ILLEGAL_MODIFICATION

Error

Data modified while owned by MPI

LOCAL:MEMORY:INACCESSIBLE

Error

Buffer given to MPI cannot be read or written

LOCAL:MEMORY:ILLEGAL_ACCESS

Error

Read or write access to memory currently owned by MPI

LOCAL:MEMORY:INITIALIZATION

Error

Distributed memory checking

LOCAL:REQUEST:ILLEGAL_CALL

Error

Invalid sequence of calls

LOCAL:REQUEST:NOT_FREED

Warning

Program creates suspiciously high number of requests or exits with pending requests

LOCAL:REQUEST:PREMATURE_FREE

Warning

An active request has been freed

LOCAL:DATATYPE:NOT_FREED

Warning

Program creates high number of data types

LOCAL:BUFFER:INSUFFICIENT_BUFFER

Warning

Not enough space for buffered send

Global Errors

Error Name

Type

Description

GLOBAL:MSG/COLLECTIVE:DATATYPE:MISMATCH

Error

The type signature does not match

GLOBAL:MSG/COLLECTIVE:DATA_TRANSMISSION_CORRUPTED

Error

Data modified during transmission

GLOBAL:MSG:PENDING

Warning

Program terminates with unreceived messages

GLOBAL:DEADLOCK:HARD

Fatal

A cycle of processes waiting for each other

GLOBAL:DEADLOCK:POTENTIAL

Fatala

A cycle of processes, one or more in blocking send

GLOBAL:DEADLOCK:NO_PROGRESS

Warning

Warning when application might be stuck

GLOBAL:COLLECTIVE:OPERATION_MISMATCH

Error

Processes enter different collective operations

GLOBAL:COLLECTIVE:SIZE_MISMATCH

Error

More or less data than expected

GLOBAL:COLLECTIVE:REDUCTION_OPERATION_MISMATCH

Error

Reduction operation inconsistent

GLOBAL:COLLECTIVE:ROOT_MISMATCH

Error

Root parameter inconsistent

GLOBAL:COLLECTIVE:INVALID_PARAMETER

Error

Invalid parameter for collective operation

GLOBAL:COLLECTIVE:COMM_FREE_MISMATCH

Warning

MPI_Comm_free() must be called collectively

a if check is enabled, otherwise it depends on the MPI implementation

For more complete information about compiler optimizations, see our Optimization Notice.
Select sticky button color: 
Orange (only for download buttons)