• 2019 Update 3
  • 03/07/2019
  • Public Content
  • Download as PDF
Contents

Output Format

Intel® Trace Collector can gather and store statistics about the function calls and their communication. These statistics are gathered even if no trace data is collected, so it is a good starting point for trying to understand an unknown application that might produce an unmanageable trace.
Usage Instructions
To collect this light-weight statistics for your application, set the following environment variables before tracing:
$ export VT_STATISTICS=ON $ export VT_PROCESS=OFF
Alternatively, set the
VT_CONFIG
environment variable to point to the configuration file:
# Enable statistics gathering STATISTICS ON # Do not gather trace data PROCESS 0:N OFF
$ export VT_CONFIG=<configuration_file_path>/config.conf
The statistics is written into the
*.stf
file. Use the
stftool
to convert the data to the ASCII text with
--print-statistics
. For example:
$ stftool tracefile.stf --print-statistics
TIP
The resulting output has easy-to-process format, so you can use text processing programs and scripts such as awk*, perl*, and Microsoft Excel* for better readability. A perl script
convert-stats
with this capability is provided in the
bin
folder.
Each line contains the following information:
  • Thread or process
  • Function ID
  • Receiver (if applicable)
  • Message size (if applicable)
  • Number of involved processes (if applicable)
And the following statistics:
  • Count – number of communications or number of calls as applicable
  • Minimum execution time excluding callee times
  • Maximum execution time excluding callee times
  • Total execution time excluding callee times
  • Minimum execution time including callee times
  • Maximum execution time including callee times
  • Total execution time including callee times
Within each line the fields are separated by colons.
Receiver is set to
0xffffffff
for file operations and to
0xfffffffe
for collective operations. If message size equals
0xffffffff
the only defined value is
0xfffffffe
to mark it as a collective operation.
The message size is the number of bytes sent or received per single message. With collective operations the following values (buckets of message size) are used for individual instances:
Value
Process-local bucket
Is the same value on all processes?
MPI_Barrier
0
Yes
MPI_Bcast
Broadcast bytes
Yes
MPI_Gather
Bytes sent
Yes
MPI_Gatherv
Bytes sent
No
MPI_Scatter
Bytes received
Yes
MPI_Scatterv
Bytes received
No
MPI_Allgather
Bytes sent + received
Yes
MPI_Allgatherv
Bytes sent + received
No
MPI_Alltoall
Bytes sent + received
Yes
MPI_Alltoallv
Bytes sent + received
No
MPI_Reduce
Bytes sent
Yes
MPI_Allreduce
Bytes sent + received
Yes
MPI_Reduce_Scatter
Bytes sent + received
Yes
MPI_Scan
Bytes sent + received
Yes
Message is set to
0xffffffff
if no message was sent, for example, for non-MPI functions or functions like
MPI_Comm_rank
.
If more than one communication event (message or collective operation) occur in the same function call (for example in
MPI_Waitall
,
MPI_Waitany
,
MPI_Testsome
,
MPI_Sendrecv
etc.), the time in that function is evenly distributed over all communications and counted once for each message or collective operation. Therefore, it is impossible to compute a correct traditional function profile from the data referring to such function instances (for example, those that are involved in more than one message per actual function call). Only the
Total execution time including callee times
and the
Total execution time excluding callee times
can be interpreted similar to the traditional function profile in all cases.
The number of involved processes is negative for received messages. If messages were received from a different process/thread it is
-2
.
Statistics are gathered on the thread level for all MPI functions, and for all functions instrumented through the API or compiler instrumentation.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804