• 2018 Update 1
  • 11/10/2017
  • Public Content
  • Download as PDF
Contents

The Command Line Interface (CLI) to the Intel® Trace Analyzer enables you to process trace files without a GUI.
Use the CLI to:
  • Compute profiling data automatically
  • Generate pre-computed trace caches for trace files
To enable the CLI, use
--cli
as the first argument to switch off the graphical user interface followed by a trace file name and any other CLI options.
For example, to perform message profile analysis on
trace.stf
, apply filter by zero sender rank and print the output in
messages.log
, enter:
$ traceanalyzer --cli --messageprofile --filter=p2pfilter(sender(0)) -o messages.txt trace.stf
If you do not specify the output file, results will be printed in standard output.
To create the cache for
trace.stf
with default resolution, enter:
$ traceanalyzer --cli trace.stf -c0 -w
A batch file to pre-compute caches might look like this:
$ traceanalyzer --cli poisson_icomm.single.stf -c0 -w $ traceanalyzer --cli poisson_sendrecv.single.stf -c0 -w $ traceanalyzer --cli vtcounterscopec.single.stf -c0 -w
Note
The CLI is for expert use and can be changed with any version without notice. 
The command line interface provides the following options:
Option Name
Action
--messageprofile
Perform message profile analysis.
--collopprofile
Perform collective operation profile analysis.
--functionprofile
Perform function profile analysis.
--starttime=TICKS or -sTICKS
Starting time of the analysis.
--endtime=TICKS or -eTICKS
Ending time of the analysis.
--tgroup=ID or -tID
Use this thread aggregation.
--fgroup=ID or -fID
Use this function aggregation.
--dump=FILE or -oFILE
The file where to store the analysis results. If not specified, results are printed in standard output.
--funcformat
A string that contains format switchers specifying how the information about functions are printed; the default value is
TFNEIS
.
Possible format options:
  1. f
    or
    F
    - prints the name of the function group
  2. t
    or
    T
    - prints the name of the thread/process group
  3. g
    or
    G
    - prints the number of processes/threads in the group
  4. E
    - prints self time in ticks
  5. e
    - prints self time in seconds
  6. I
    - prints total time in ticks
  7. i
    - prints total time in seconds
  8. n
    or
    N
    - prints the number of calls
  9. s
    or
    S
    - prints the source code location (if possible)
--messageformat
A string that contains format switchers specifying how the information about point-to-point messages is printed; the default value is
12DdIiXxAauUn
.
Possible format options:
1
- prints if the first member of the message is sender and/or receiver
2
- prints if the second member of the message is sender and/or receiver
D
- prints the summary duration in ticks
d
- prints the summary duration in seconds
v
or
V
- prints the summary amount of bytes sent
k
or
K
- prints the minimum amount of bytes sent
l
or
L
- prints the maximum amount of bytes sent
U
- prints the minimum duration in ticks
u
- prints the minimum duration in seconds
X
- prints the maximum duration in ticks
x
- prints the maximum duration in seconds
I
- prints the minimum rate in Bytes/tick
i
- prints the minimum rate in Bytes/second
A
- prints the maximum rate in Bytes/tick
a
- prints the maximum rate in Bytes/second
n
or
N
- prints the number of messages
--collopformat
A string that contains format options specifying how the information about collective operations is printed; the default value is
12DdIiXxAauUnvwyzlk
.
Possible format options:
1
- prints the name of the process group
2
- prints the name of the operation
D
- prints the summary duration in ticks
d
- prints the summary duration in seconds
U
- prints the minimum duration in ticks
u
- prints the minimum duration in seconds
X
- prints the maximum duration in ticks
x
- prints the maximum duration in seconds
I
- prints the minimum rate in Bytes/tick
i
- prints the minimum rate in Bytes/second
A
- prints the maximum rate in Bytes/tick
a
- prints the maximum rate in Bytes/second
v
or
V
- prints the summary amount of bytes sent
k
or
K
- prints the minimum amount of bytes sent
l
or
L
- prints the maximum amount of bytes sent
w
or
W
- prints the summary amount of bytes received
y
or
Y
- prints the minimum amount of bytes received
z
or
Z
- prints the maximum amount of bytes received
n
or
N
- prints the number of collective operations
--readstats or -S
Request statistics, if available, instead of trace data.
--readcache[=FILE] or -r[FILE]
Read the trace cache from the specified (if provided) or default file.
--writecache[=FILE] or -w[FILE]
If a trace cache has been built, write it to the specified (if provided) or default file.
--buildcache=RESOLUTION or -cRESOLUTION
Build a trace cache with the specified resolution. The resolution is given in clock ticks. Higher values result in smaller (coarser) cache files, 0 (zero) is used as the default resolution.
--filter=EXPRESSION or -FEXPRESSION
The filter to use for the analysis, specified as a filter grammar string.
EXPRESSION
may be:
funcfilter
,
p2pfilter
,
collfilter
or their combinations. For details, see the Filter Expression Grammar section.
--messagefirst=GROUPING
The first grouping in the message profile analysis result (first dimension of matrix).
--messagesecond=GROUPING
The second grouping in the message profile analysis result (second dimension of matrix).
--collopfirst=GROUPING
The first grouping in the collective operation profile analysis result (first dimension of matrix).
--collopsecond=GROUPING
The second grouping in the collective operation profile analysis result (second dimension of matrix).
--summary
Generate the application summary sheet with the format that is described below.
--icpf [options] <tracefile> --simulator <simulator library>
Process a trace file using the specified simulator at runtime.
Use the traceanalyzer
-icpf
option to process your trace files using specific simulator library. In
--icpf [options]
<
tracefile
>
--simulator
<
simulator libraray
>, the [options] can be:
-s
<NUM>
- processes the trace starting at the time (NUM measured in ticks).
-e
<NUM>
- processes the trace to the end time (NUM measured in ticks).
-w
<NUM>
- processes the trace based on NUM, 0 for STF, 1 for ASCII, else devnull.
-o
<new_name>
- trace output file name.
-u
- single file mode. The output file is a single STF.
-h
- prints this message and exits.
--ideal [options] <tracefile>
Produce an ideal trace.
Use the traceanalyzer
--ideal
option to idealize a trace by Ideal Interconnect Simulator. In
--ideal [options]
<
tracefile
>, the
[options]
can be:
-
s
<NUM>
- processes the trace starting at the time (
NUM
measured in ticks; the default value is 0).
-e
<NUM>
- processes the trace to the end time (NUM measured in ticks; the default value is the end time of the trace).
-w
<NUM>
- processes the trace based on NUM, 0 for STF, 1 for ASCII, else devnull (the default value is 0).
-o
<new_name>
- trace output file name.
-u
- single file mode. The output file is a single STF.
-sp
- shows percent progress indicator.
-q
- quiet mode; turns off all output.
-h
- prints this message and exits.
--breakdowns <real_trace_name> <ideal_trace_name>
Create intermediate *.bdi files that contain all needed information for the Imbalance Diagram.
--merge <unmerged_trace_name> [<merged_trace_name>] [-single] [-delete-raw-data] [-sumdata]
Merge the raw trace.
<merged_trace_name>
- if set this option, then the output trace will have this name; otherwise suffix
.merged
will be added to the original name.
-single
- create a single STF file instead of multiple ones
-delete-raw-data
- delete the raw trace after merging
-sumdata
- create summary data files while merging
--sumdata <trace_name>
Create summary data files from an ordinary trace
--assist [options] <tracefile>
Use the
--assist
option to discover performance problems in your application. To learn more about the Performance Assistant, refer to the Performance Assistant section.
In
--assist [options] <tracefile>
[options]
can be:
-s <NUM>
- processes the trace starting at the time NUM measured in ticks; the default value is 0.
-e <NUM>
- processes the trace to the end time NUM measured in ticks; the default value is the end time of the trace
-h
- prints this message and exits
--interval=PERCENT or -iPERCENT
Select the time interval in the trace file to be analyzed.
PERCENT
represents the percent of time taken from the middle of the trace file. This value may range from 0 to 100 (default).
For example, if you set the interval to 20%, and your application time is 10 seconds, only the interval from 4 to 6 seconds will be analyzed.
The application summary sheet consists of a three-line header:
<# processes>
:
<# processes per node> <application time>
:
<MPI time>
:
<IIS time> <first message size of middle bucket (2)>
: \
<first message size of highest bucket (3)>
The header is followed by these sets of lines, for each of the top ten  functions, sorted by descending total time:
<Name of MPI_group>
:
<# involved processes>
<total time in above func for bucket 1>
:
<for bucket 2>
:
<for bucket 3> <total IIS time in above func for bucket 1>
:
<for bucket 2>
:
<for bucket 3> <count in above func for bucket 1>
:
<for bucket 2>
:
<for bucket 3> <total # bytes in above func for bucket 1>
:
<for bucket 2>
:
<for bucket 3>
In the application summary sheet, IIS stands for Ideal Interconnect Simulator, which predicts MPI behavior on an ideal interconnect.
You can import the application summary sheet to spreadsheet applications such as Microsoft* Office Excel*. Fields are separated by colons. Unknown values are indicated by
N/A

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804