User Guide

Intel® VTune™ Profiler User Guide

ID 766319
Date 12/16/2022
Public

A newer version of this document is available. Customers should click here to go to the newest version.

Document Table of Contents

OpenSHMEM* Code Analysis with Fabric Profiler

Fabric Profiler (preview feature) is a performance tool that you can use to identify detailed characteristics of the runtime behavior for an OpenSHMEM application.

NOTE:

This is a PREVIEW FEATURE. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases.

The application consists of two parts:

  • Data collector monitors application and network behavior while the OpenSHMEM application is running.

  • Analyzer is a collection of tools that runs on a Linux* or Windows* workstation after the application has completed. These tools display profiling results with interactive features that allow you to explore a multitude of communication-centric behaviors.

NOTE:

The Fabric Profiler tool is distributed as part of Intel® VTune™ Profiler. Full documentation of the tool, examples, and pre-collected trace files are available in the Fabric Profiler package.

Set Up the Data Collector

The Fabric Profiler data collector is implemented as a library that intercepts the OpenSHMEM calls of the application and monitors network activity. It populates binary trace files with this information.

Prerequisites: Load the esp module by running: module load esp. The data collector package is installed in the ESP_ROOT environment variable .

The data collector requires two third party libraries:

  • PAPI is used to gather system metrics at runtime. To add PAPI to your environment you may need to run module load papi, or download it from icl.utk.edu/papi/software and build it.

  • OTF2 is used to generate trace files. You can obtain OTF2 at score-p.org.

Set Up the Analyzer

The analyzer is a collection of MATLAB* programs that run in the MATLAB runtime environment. They read the trace files and display results.

Prerequisites: You must have the MATLAB Runtime Environment to install the analyzer. This is a free download available at https://www.mathworks.com/products/compiler/mcr.html. Select a version that is R2018a(9.4) or newer.

The analyzer is located in the release directory in esp/bin/analyzer. It is a MATLAB program named fabric_profiler_v100.

To start the analyzer, run the fpro script.

Fabric Profiler Workflow

In the Fabric Profiler workflow, you perform these steps:

  1. Build and run an application using the data collector.
  2. Generate trace files.
  3. View trace files using the analyzer.

Build and Run an Application

Once you have installed Fabric Profiler on a Linux or Windows machine, complete these steps to build and run an application.

  1. Define Fabric Profiler regions in the source code.

    A named region is highlighted in analyzer displays and improves analysis.

    1. Include the header file esp.h.
    2. Mark regions of interest:
      esp_enter("<region_name>");     
               exit_exit("<region_name>");
    3. Rebuild the application.
    NOTE:
    You cannot nest or interleave regions.
  2. Build a statically-linked application with Fabric Profiler instrumentation.

    When you load the Fabric Profiler module (esp), environment variables define important flags for you. Use these variables to link the Fabric Profiler data collector library into your code before the SHMEM library.

    For example, to build the fixed-round example (from the examples directory) using Cray SHMEM, type:

    CC -static -o fixed-round $ESP_CFLAGS fixed-round.c $ESP_LDFLAGS $ESP_LDADD

    Make sure you adhere to these changes from your normal build:

    • Use the C++ compiler, even if the C-language application does not require it. The data collector library uses C++ and will not link without it.

    • Use $ESP_CFLAGS to add the path to esp.h. It also adds -g which improves the quality of the trace files.

    • Use $ESP_LDFLAGS to add the path to the data collector library.

    • Use $ESP_LDADD to add the data collector library.

  3. Build a dynamically-linked application with Fabric Profiler instrumentation.

    Fabric Profiler uses LD_PRELOAD at run-time to link in the data collector library before the SHMEM library. Therefore, you do not need to rebuild your application unless you added Fabric Profiler regions to your source code.

    For example, the fixed-round.c application (in the examples directory) is written in C. Unlike the case of static linking above, you do not need to use the C++ compiler to build this C-language application for use with Fabric Profiler instrumentation.

    cc -o fixed-round $ESP_CFLAGS fixed-round.c -dynamic

    $ESP_CFLAGS sets the path to esp.h and adds -g.

  4. Run an application with Fabric Profiler instrumentation.

    1. The data collector library uses the PAPI library and the OTF2 library. If you are using the shared library, you may need to run module load papi, or add PAPI to your library paths. You can download OTF2 at score-p.org.

    2. Load the Fabric Profiler module:

      module load esp
    3. There are many Fabric Profiler configuration parameters. The module sets them to default values which are sufficient when you run your application for the first time. The configuration parameters are described in a separate section.

    4. For a dynamic application, add the data collector library to the LD_PRELOAD variable.

      For example:

      export LD_PRELOAD=$ESP_ROOT/lib/libesp.so:$LD_PRELOAD
               srun --export=LD_PRELOAD,ALL <rest of srun command>
      
      If you have loaded the esp module, the environment variable ESP_LIB contains the path to libesp.so. See the sample job scripts *.slurm and *.lsf in the examples directory.

Generate Trace Files

Once you run the data collector, it monitors the execution of your application as well as network activity. It writes trace files when the application has finished executing. Add 10% to your wall time for writing output to the trace files.

  1. See the application output to verify successful code instrumentation by the data collector. To verify, check these actions:

    1. Ensure that the ESP_VERBOSITY_LEVEL environment variable is set to 1 and not 0.

    2. Call shmem_init. The start banner of Fabric Profiler displays.

    3. Call shmem_finalize. The stop banner of Fabric Profiler displays.

    If the ESP_VERBOSITY_LEVEL environment variable is set correctly and the banners do not display on function call, contact esp-support@intel.com for further assistance.

  2. Merge the trace files.

    The Fabric Profiler banner lists the path to the trace files. To merge traces, run esp_merge_traces.sh script:

    $ESP_ROOT/bin/esp_merge_traces.sh \    
    <path to application executable> <path to trace directory> <number of PEs>
  3. Copy the trace files in the root level of the traces directory to the machine where you have installed the analyzer.

View Trace Files using the Analyzer

There are five types of analyzers which read trace files. All of them are located in esp/bin/analyzer in the Fabric Profiler package. The analyzers are:

  • espba - Barrier analyzer

  • espfbla - Function backlog analyzer

  • espla - Function latency analyzer

  • espmsa - Message straggler analyzer

  • espr - A report that contains a summary of results

You can use the traces generated in the previous step or open pre-collected sample traces from esp/examples/samples/trace. Each of these traces corresponds to a SHMEM application in the esp/examples directory.

NOTE:

espr is a general report that summarizes all of the trace data in HTML format. Each sample application in the examples directory includes this report so you can view the report for the sample application without running the SHMEM application or MATLAB runtime. The esp/examples/samples/html directory contains files named {app name}_{number of PEs}.htmland associated directories named {app name}_{number of PEs}_html_files. Open the HTML file in a browser to view the report generated by the analyzer from the corresponding trace files in esp/examples/output/samples/trace.

Contents of Trace Files

During the operation of Fabric Profiler, when your application calls shmem_finalize, the data collector writes five trace files that contain information about application behavior.

Trace File Format Contents
{trace-file-prefix}.uc1.func

Binary

Information about every profiled SHMEM function call. Each process writes out a separate function trace file. After job completion, the individual function trace files are merged into a single file with the esp/bin/collector/esp_merge_traces.sh script. The merged file is required by the analyzers.

{trace-file-prefix}.uc1.hfi

Binary

When the SHMEM application is running, Fabric Profiler monitors send and receive counters on the host fabric interface card. The HFI file contains these time-stamped counter values.

{trace-file-prefix}.uc1.profile

Binary

When the SHMEM application is running, Fabric Profiler monitors system performance counters and gathers system information. This data is written to the profile file. Each process writes out a separate profile file. When the job completes, the individual profile trace files are merged into a single file with the esp/bin/collector/esp_merge_traces.sh script. The merged file is required by the analyzers.

{trace-file-prefix}.uc1.put

Binary

Fabric Profiler monitors the amount of data injected into the network with each shmem_put call and the destination node for each put operation. The put file contains these values.

{trace-file-prefix}.uc1.ev.txt

Text

The environment file is a list of all environment variables defined at SHMEM application run-time.

Types of Analyzers

This table describes each analyzer in the Fabric Profiler package, along with associated operations that you can perform.

Analyzer Type Name Purpose Suggested Operations
espba

Barrier Trace Analyzer

Reads the function trace file and displays barrier wait times for each barrier call in the source code for each PE.

  • Take any of these measurements:

    • PE wait time
    • PE arrival time
    • Node wait density
    • PE percent Late
    • PE Outlier Late
  • Vary the threshold.
  • Restrict your results to a specific lexical occurrence (a particular source code line containing a barrier)
espfbla

Fabric Backlog Analyzer

Reads the put trace file and correlates that with the HFI trace file to visualize fabric backlog at any point in time.

  • Select "Show Region Bounds" and choose regions of interest. If the SHMEM code defined code regions, the temporal regions are highlighted on the graph of network backlog against time.

  • Select an individual node to display its associated backlog.

  • View injection and or ejection backlog (requested less actual)

    • Injection requested, data sent off-node by this node in the application

    • injection actual, data sent into network by the HFI

    • Ejection requested, data sent by other nodes in application to this node

    • Ejection actual, data received from network according to HFI

  • Zoom and pan to bring areas into focus.

  • Try offset adjustment modes.

  • Switch between toggle and rate displays.

  • Use the data cursor. Click on the widget first. Next clock anywhere on the plot to see data values for that point.

espla

Function (latency) Trace Analyzer

Reads the function trace file and displays function latency for all instrumented SHMEM calls. Trace files that contain ~100,000s of function calls can take several minutes to complete. The default display shows composite PE wait time for all calls at each point in time.

  • Select individual function calls to display latency hot spots for each call.

  • If the application defined Fabric Profiler regions, click View Regions. Choose regions to highlight temporal spans on the graph which represent those regions of code.

  • Switch to the communications matrix. This visualizes the volume of data sent from each PE to every other PE.

  • Use the zoom, pan and data cursor widgets (under File and Help menus) to drill into the display data.

  • Experiment with the threshold controls for frequency, high value, and low value.

espmsa

Message Straggler Analyzer

Reads the function trace file and correlates the activity in the trace file with network activity in the HFI trace file.

 
espr

Analyzer Report

A non-interactive report that gathers information about a SHMEM application run and displays it in HTML format. The report can take several minutes to be completed. When completed, the HTML report is saved in the same location as the profile trace file, with a matching file name.

Use the File menu to select the profile trace file for a particular application run.