User Guide

  • 2020
  • 05/04/2020
  • Public Content
Contents

Viewing MPI Collected Data

Once the results are collected, the user can open any of them in the standalone GUI or generate a command line report. Use
inspxe-cl
-help report
or
amplxe-cl
-help report
to see the options available for generating reports.
To view the results through GUI, launch the
{amplxe | inspxe}-gui <result path>
command or launch the
*-gui
tool and use the
File
Open
Result...
menu item to point to the result. Sometimes it is also convenient to copy the result to another system and view it there (for example, to open a result collected on a Linux cluster on a Windows workstation).
MPI functions are classified by the
Intel® VTune™
Profiler
as system ones making its level of support in this regard similar to
Intel® Threading Building Blocks (Intel® TBB)
and OpenMP*. This helps the user to focus on his/her code rather than MPI internals.
Intel VTune
Profiler
GUI
Call Stack Mode
and CLI
-stack-mode
switches can be used to turn on the mode where the system functions are displayed and thus the internals of the MPI implementation can be viewed and analyzed. The call stack mode
User functions+1
is especially useful to find the MPI functions that consume most of CPU Time (Hotspots analysis) or waited the most (Locks and Waits analysis). For example, assume there is a call chain
main() -> foo() -> MPI_Bar() -> MPI_Bar_Impl() -> ...
where
MPI_Bar()
is the actual MPI API function you use and the deeper functions are MPI implementation details. The call stack modes behave as follows:
  • The default
    Only user functions
    call stack mode will attribute time spent in the MPI calls to the user function
    foo()
    so that you can see which of your functions you can change to actually improve the performance.
  • The
    User functions+1
    mode will attribute the time spent in the MPI implementation to the top-level system function -
    MPI_Bar()
    so that you can easily see outstandingly heave MPI calls.
  • The
    User/system functions
    mode will show the call tree without any reattribution so that you can see where exactly in the Intel® MPI library the time was spent.
Intel VTune
Profiler
/ Intel Inspector provide Intel TBB and OpenMP support. It is recommended to use these thread-level parallel solutions in addition to MPI-style parallelism to maximize the CPU resource usage across the cluster, and to use the
Intel VTune
Profiler
/ Intel Inspector to analyze the performance / correctness of that level of parallelism. The MPI, OpenMP, and Intel TBB features in the tools are functionally independent, so all usual features of OpenMP and Intel TBB support are applicable when looking into a result collected for an MPI process.
Example
Here is an example of viewing the text report for functions and modules after a
Intel VTune
Profiler
analysis (note that we open individual results each of which was collected for a specific rank of MPI process -
foo.14
and
foo.15
in the example above):
$ amplxe-cl -R hotspots -q -format text -r foo.14 Function Module CPU Time -------- ------ -------- f        a.out  6.070 main     a.out  2.990 $ amplxe-cl -R hotspots -q -format text -group-by module -r foo.14 Module CPU Time ------ -------- a.out  9.060

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804