User Guide

Contents

OpenSHMEM* Code Analysis with Fabric Profiler

Fabric Profiler (preview feature) is a performance tool that you can use to identify detailed characteristics of the runtime behavior for an OpenSHMEM application.
This is a
PREVIEW FEATURE
. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.
The application consists of two parts:
  • Data collector
    monitors application and network behavior while the OpenSHMEM application is running.
  • Analyzer
    is a collection of tools that runs on a Linux* or Windows* workstation after the application has completed. These tools display profiling results with interactive features that allow you to explore a multitude of communication-centric behaviors.
The Fabric Profiler tool is distributed as part of
Intel® VTune™
Profiler
. Full documentation of the tool, examples, and pre-collected trace files are available in the Fabric Profiler package.

Set Up the Data Collector

The Fabric Profiler data collector is implemented as a library that intercepts the OpenSHMEM calls of the application and monitors network activity. It populates binary trace files with this information.
Prerequisites:
Load the esp module by running:
module load esp
. The data collector package is installed in the
ESP_ROOT
environment variable .
The data collector requires two third party libraries:
  • PAPI is used to gather system metrics at runtime. To add PAPI to your environment you may need to run
    module load papi
    , or download it from icl.utk.edu/papi/software and build it.
  • OTF2 is used to generate trace files. You can obtain OTF2 at score-p.org.

Set Up the Analyzer

The analyzer is a collection of MATLAB* programs that run in the MATLAB runtime environment. They read the trace files and display results.
Prerequisites
: You must have the MATLAB Runtime Environment to install the analyzer. This is a free download available at https://www.mathworks.com/products/compiler/mcr.html. Select a version that is R2018a(9.4) or newer.
The analyzer is located in the release directory in
esp/bin/analyzer
. It is a MATLAB program named
fabric_profiler_v100
.
To start the analyzer, run the
fpro
script.

Fabric Profiler Workflow

In the Fabric Profiler workflow, you perform these steps:
  1. Build and run an application using the data collector.
  2. Generate trace files.
  3. View trace files using the analyzer.

Build and Run an Application

Once you have installed Fabric Profiler on a Linux or Windows machine, complete these steps to build and run an application.
  1. Define Fabric Profiler regions in the source code.
    A named region is highlighted in analyzer displays and improves analysis.
    1. Include the header file
      esp.h
      .
    2. Mark regions of interest:
      esp_enter("<region_name>"); exit_exit("<region_name>");
    3. Rebuild the application.
    You cannot nest or interleave regions.
  2. Build a statically-linked application with Fabric Profiler instrumentation.
    When you load the Fabric Profiler module (
    esp
    ), environment variables define important flags for you. Use these variables to link the Fabric Profiler data collector library into your code before the SHMEM library.
    For example, to build the
    fixed-round
    example (from the examples directory) using Cray SHMEM, type:
    CC -static -o fixed-round $ESP_CFLAGS fixed-round.c $ESP_LDFLAGS $ESP_LDADD
    Make sure you adhere to these changes from your normal build:
    • Use the C++ compiler, even if the C-language application does not require it. The data collector library uses C++ and will not link without it.
    • Use
      $ESP_CFLAGS
      to add the path to
      esp.h
      . It also adds
      -g
      which improves the quality of the trace files.
    • Use
      $ESP_LDFLAGS
      to add the path to the data collector library.
    • Use
      $ESP_LDADD
      to add the data collector library.
  3. Build a dynamically-linked application with Fabric Profiler instrumentation.
    Fabric Profiler uses
    LD_PRELOAD
    at run-time to link in the data collector library before the SHMEM library. Therefore, you do not need to rebuild your application unless you added Fabric Profiler regions to your source code.
    For example, the
    fixed-round.c
    application (in the examples directory) is written in C. Unlike the case of static linking above, you do not need to use the C++ compiler to build this C-language application for use with Fabric Profiler instrumentation.
    cc -o fixed-round $ESP_CFLAGS fixed-round.c -dynamic
    $ESP_CFLAGS
    sets the path to
    esp.h
    and adds
    -g
    .
  4. Run an application with Fabric Profiler instrumentation.
    1. The data collector library uses the PAPI library and the OTF2 library. If you are using the shared library, you may need to run
      module load papi
      , or add PAPI to your library paths. You can download OTF2 at score-p.org.
    2. Load the Fabric Profiler module:
      module load esp
    3. There are many Fabric Profiler configuration parameters. The module sets them to default values which are sufficient when you run your application for the first time. The configuration parameters are described in a separate section.
    4. For a dynamic application, add the data collector library to the
      LD_PRELOAD
      variable.
      For example:
      export LD_PRELOAD=$ESP_ROOT/lib/libesp.so:$LD_PRELOAD srun --export=LD_PRELOAD,ALL <rest of srun command>
      If you have loaded the
      esp
      module, the environment variable
      ESP_LIB
      contains the path to
      libesp.so
      . See the sample job scripts
      *.slurm
      and
      *.lsf
      in the examples directory.

Generate Trace Files

Once you run the data collector, it monitors the execution of your application as well as network activity. It writes trace files when the application has finished executing. Add 10% to your wall time for writing output to the trace files.
  1. See the application output to verify successful code instrumentation by the data collector. To verify, check these actions:
    1. Ensure that the
      ESP_VERBOSITY_LEVEL
      environment variable is set to 1 and not 0.
    2. Call
      shmem_init
      . The start banner of Fabric Profiler displays.
    3. Call
      shmem_finalize
      . The stop banner of Fabric Profiler displays.
    If the
    ESP_VERBOSITY_LEVEL
    environment variable is set correctly and the banners do not display on function call, contact
    esp-support@intel.com
    for further assistance.
  2. Merge the trace files.
    The Fabric Profiler banner lists the path to the trace files. To merge traces, run
    esp_merge_traces.sh
    script:
    $ESP_ROOT/bin/esp_merge_traces.sh \ <path to application executable> <path to trace directory> <number of PEs>
  3. Copy the trace files in the root level of the traces directory to the machine where you have installed the analyzer.

View Trace Files using the Analyzer

There are five types of analyzers which read trace files. All of them are located in
esp/bin/analyzer
in the Fabric Profiler package. The analyzers are:
  • espba
    - Barrier analyzer
  • espfbla
    - Function backlog analyzer
  • espla
    - Function latency analyzer
  • espmsa
    - Message straggler analyzer
  • espr
    - A report that contains a summary of results
You can use the traces generated in the previous step or open pre-collected sample traces from
esp/examples/samples/trace
. Each of these traces corresponds to a SHMEM application in the
esp/examples
directory.
espr
is a general report that summarizes all of the trace data in HTML format. Each sample application in the examples directory includes this report so you can view the report for the sample application without running the SHMEM application or MATLAB runtime. The
esp/examples/samples/html
directory contains files named
{app name}_{number of PEs}.html
and associated directories named
{app name}_{number of PEs}_html_files
. Open the HTML file in a browser to view the report generated by the analyzer from the corresponding trace files in
esp/examples/output/samples/trace
.
Contents of Trace Files
During the operation of Fabric Profiler, when your application calls
shmem_finalize
, the data collector writes five trace files that contain information about application behavior.
Trace File
Format
Contents
{trace-file-prefix}.uc1.func
Binary
Information about every profiled SHMEM function call. Each process writes out a separate function trace file. After job completion, the individual function trace files are merged into a single file with the
esp/bin/collector/esp_merge_traces.sh
script. The merged file is required by the analyzers.
{trace-file-prefix}.uc1.hfi
Binary
When the SHMEM application is running, Fabric Profiler monitors send and receive counters on the host fabric interface card. The HFI file contains these time-stamped counter values.
{trace-file-prefix}.uc1.profile
Binary
When the SHMEM application is running, Fabric Profiler monitors system performance counters and gathers system information. This data is written to the profile file. Each process writes out a separate profile file. When the job completes, the individual profile trace files are merged into a single file with the
esp/bin/collector/esp_merge_traces.sh
script. The merged file is required by the analyzers.
{trace-file-prefix}.uc1.put
Binary
Fabric Profiler monitors the amount of data injected into the network with each
shmem_put
call and the destination node for each
put
operation. The put file contains these values.
{trace-file-prefix}.uc1.ev.txt
Text
The environment file is a list of all environment variables defined at SHMEM application run-time.
Types of Analyzers
This table describes each analyzer in the Fabric Profiler package, along with associated operations that you can perform.
Analyzer Type
Name
Purpose
Suggested Operations
espba
Barrier Trace Analyzer
Reads the function trace file and displays barrier wait times for each barrier call in the source code for each PE.
  • Take any of these measurements:
    • PE wait time
    • PE arrival time
    • Node wait density
    • PE percent Late
    • PE Outlier Late
  • Vary the threshold.
  • Restrict your results to a specific lexical occurrence (a particular source code line containing a barrier)
espfbla
Fabric Backlog Analyzer
Reads the put trace file and correlates that with the HFI trace file to visualize fabric backlog at any point in time.
  • Select "Show Region Bounds" and choose regions of interest. If the SHMEM code defined code regions, the temporal regions are highlighted on the graph of network backlog against time.
  • Select an individual node to display its associated backlog.
  • View injection and or ejection backlog (requested less actual)
    • Injection requested, data sent off-node by this node in the application
    • injection actual, data sent into network by the HFI
    • Ejection requested, data sent by other nodes in application to this node
    • Ejection actual, data received from network according to HFI
  • Zoom and pan to bring areas into focus.
  • Try offset adjustment modes.
  • Switch between toggle and rate displays.
  • Use the data cursor. Click on the widget first. Next clock anywhere on the plot to see data values for that point.
espla
Function (latency) Trace Analyzer
Reads the function trace file and displays function latency for all instrumented SHMEM calls. Trace files that contain ~100,000s of function calls can take several minutes to complete. The default display shows composite PE wait time for all calls at each point in time.
  • Select individual function calls to display latency hot spots for each call.
  • If the application defined Fabric Profiler regions, click
    View Regions
    . Choose regions to highlight temporal spans on the graph which represent those regions of code.
  • Switch to the communications matrix. This visualizes the volume of data sent from each PE to every other PE.
  • Use the zoom, pan and data cursor widgets (under File and Help menus) to drill into the display data.
  • Experiment with the threshold controls for frequency, high value, and low value.
espmsa
Message Straggler Analyzer
Reads the function trace file and correlates the activity in the trace file with network activity in the HFI trace file.
espr
Analyzer Report
A non-interactive report that gathers information about a SHMEM application run and displays it in HTML format. The report can take several minutes to be completed. When completed, the HTML report is saved in the same location as the profile trace file, with a matching file name.
Use the File menu to select the profile trace file for a particular application run.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804