Viewing MPI Collected Data

Once the results are collected, the user can open any of them in the standalone GUI or generate a command line report. Use inspxe-cl-help report or amplxe-cl-help report to see the options available for generating reports.

To view the results through GUI, launch the {amplxe | inspxe}-gui <result path> command or launch the *-gui tool and use the File > Open > Result... menu item to point to the result. Sometimes it is also convenient to copy the result to another system and view it there (for example, to open a result collected on a Linux cluster on a Windows workstation).

MPI functions are classified by the VTune Amplifier as system ones making its level of support in this regard similar to Intel TBB and OpenMP. This helps the user to focus on his/her code rather than MPI internals. VTune Amplifier GUI Call Stack Mode and CLI -stack-mode switches can be used to turn on the mode where the system functions are displayed and thus the internals of the MPI implementation can be viewed and analyzed. The call stack mode User functions+1 is especially useful to find the MPI functions that consume most of CPU Time (Hotspots analysis) or waited the most (Locks and Waits analysis). For example, assume there is a call chain main() -> foo() -> MPI_Bar() -> MPI_Bar_Impl() -> ... where MPI_Bar() is the actual MPI API function you use and the deeper functions are MPI implementation details. The call stack modes behave as follows:

  • The default Only user functions call stack mode will attribute time spent in the MPI calls to the user function foo() so that you can see which of your functions you can change to actually improve the performance.

  • The User functions+1 mode will attribute the time spent in the MPI implementation to the top-level system function - MPI_Bar() so that you can easily see outstandingly heave MPI calls.

  • The User/system functions mode will show the call tree without any reattribution so that you can see where exactly in the MPI library the time was spent.

VTune Amplifier / Intel Inspector provide Intel TBB and OpenMP support. It is recommended to use these thread-level parallel solutions in addition to MPI-style parallelism to maximize the CPU resource usage across the cluster, and to use the VTune Amplifier / Intel Inspector to analyze the performance / correctness of that level of parallelism. The MPI, OpenMP, and Intel TBB features in the tools are functionally independent, so all usual features of OpenMP and Intel TBB support are applicable when looking into a result collected for an MPI process.

Example

Here is an example of viewing the text report for functions and modules after a VTune Amplifier analysis (note that we open individual results each of which was collected for a specific rank of MPI process - foo.14 and foo.15 in the example above):

$ amplxe-cl -R hotspots -q -format text -r foo.14
Function Module CPU Time
-------- ------ --------
f        a.out  6.070
main     a.out  2.990

$ amplxe-cl -R hotspots -q -format text -group-by module -r foo.14
Module CPU Time
------ --------
a.out  9.060
For more complete information about compiler optimizations, see our Optimization Notice.