User Guide

Contents

Check How Assumed Dependencies Affect Modeling

The Dependencies analysis adds a high overhead to your application and is optional for the
Offload Modeling
workflow, but information about loop-carried dependencies might be very important for the
Intel® Advisor
to decide if a loop can be profitable to run on a GPU.
If a loop has dependencies, it cannot be run in parallel and in most cases cannot be offloaded to the GPU.
Intel Advisor
can get the information about loop-carried dependencies from the following resources:
  • Using Intel® Compiler diagnostics. The dependencies are found at the compile time for some loops and the diagnostics are passed to the
    Intel Advisor
    using the integration with Intel Compilers.
  • Parsing the application call stack tree. If a loop is parallelized or vectorized on a CPU or is already GPU-enabled but executed on a CPU,
    Intel Advisor
    assumes that you resolved the loop-carried dependencies before parallelizing or offloading the loop.
  • Using the Dependencies analysis results. This analysis detects dependencies for most loops at run time, but a result might depend on an application workload. It also adds a high overhead making the application execute 5 - 100 times slower than without the
    Intel Advisor
    . To reduce overhead, you can use various techniques, for example, mark up loops of interest.

Verify Assumed Dependencies

If you do not know what dependency types there are in your application, run the
Offload Modeling
without the Dependencies analysis first to check if potential dependencies affect modeling results. Use the following strategy to decide if you need to run the Dependencies analysis to check for loop-carried dependencies when running the
Offload Modeling
:
  1. Run the
    Offload Modeling
    without the Dependencies analysis.
    • From GUI: Select Medium accuracy level and enable the
      Assume Dependencies
      option for the Performance Modeling in the
      Analysis Workflow
      tab. Run the perspective.
    • From CLI: Run the analyses, for example, using the
      advisor
      command line interface:
      advisor --collect=survey --project-dir=
      <project-dir>
      --stackwalk-mode=online --static-instruction-mix --
      <target-application>
      [
      <target-options>
      ]
      advisor --collect=tripcounts --project-dir=
      <project-dir>
      --flop --enable-cache-simulation --target-device=
      <target>
      --stacks --data-transfer=light --
      <target-application>
      [
      <target-options>
      ]
      advisor --collect=projection --project-dir=
      <project-dir>
      --config=
      <config>
  2. Open the generated report and go to the
    Accelerated Regions
    tab.
  3. Expand the
    Measured
    column group and examine the
    Dependency Type
    column.
    • You
      do not
      need to run the Dependencies analysis for loops with the following dependency types:
      • Parallel: Programming Model
        dependency type means that the loop is GPU-enabled (for example, with Data Parallel C++, OpenCL™ or OpenMP*
        target
        ).
      • Parallel: Explicit
        dependency type means that the loop is threaded and vectorized on CPU (for example, with OpenMP
        parallel for
        or Intel® oneAPI Threading Building Blocks
        parallel for
        ).
      • Parallel: Proven
        dependency type means that an Intel Compiler found no dependencies at the compile time.
    • You
      might
      need to run the Dependencies analysis for loops that have
      Dependency: Assumed
      dependency type. It means that the
      Intel Advisor
      does not have information about loop-carried dependencies for these loops and do not consider them as offload candidates.
  4. If you see many
    Dependency: Assumed
    types, rerun the performance modeling with assumed dependencies ignored, as follows:
    • From GUI: Select
      only
      the Performance Modeling step in the
      Analysis Workflow
      tab and make sure the
      Assume Dependencies
      option is disabled. Run the perspective.
    • From CLI: Run the Performance Modeling with
      --no-assume-dependencies
      option:
      advisor --collect=projection --project-dir=
      <project-dir>
      --config=
      <config>
      --no-assume-dependencies
  5. Review the results generated to check if the potential dependencies might block offloading to GPU.
    Loops that previously had
    Dependency: Assumed
    dependency type are now marked as
    Parallel: Assumed
    .
    Intel Advisor
    models their performance on the target GPU and checks potential offload profitability and speedup.
  6. Compare the program metrics calculated with and without assumed dependencies, such as speedup, number of offloads, and estimated accelerated time.
    • If the difference is small, for example, 1.5x speedup with assumed dependencies and 1.6x speedup without assumed dependencies, you can
      skip the Dependencies analysis
      and rely on the current estimations. In this case, most loops with potential dependencies are not profitable to be offloaded and do not add much speedup to the application on the target GPU.
    • If the difference is big, for example, 2x speedup with assumed dependencies and 40x speedup without assumed dependencies, you should
      run the Dependencies analysis
      . In this case, the information about loop-carried dependencies is critical for correct performance estimation.

Run the Dependencies Analysis

To check for real dependencies in your code:
  1. Mark up loops for the analysis to minimize overhead. For the
    Offload Modeling
    , use the generic markup strategy, which automatically selects only loops that can be effectively offloaded to a target GPU.
    • From GUI: Go to
      File
      Project Properties...
      Performance Modeling
      and enter
      --select markup=gpu-generic
      in the
      Other parameters
      filed. Click
      OK
      .
    • From CLI:
      advisor --mark-up-loops --select markup=gpu-generic --project-dir=
      <project-dir>
      --
      <target-application>
      [
      <target-options>
      ]
  2. Run the Dependencies analysis and rerun the Performance Modeling to get more accurate estimations of your application performance on GPU:
    • From GUI: Enable the Dependencies and Performance Modeling analyses in the
      Analysis Workflow
      tab. Rerun the
      Offload Modeling
      with only these two analyses enabled.
    • From CLI:
      1. Run the Dependencies analysis:
        advisor --collect=dependencies --project-dir=
        <project-dir>
        --loop-call-count-limit=16 --filter-reductions --
        <target-application>
        [
        <target-options>
        ]
      2. Run the Performance Modeling analysis:
        advisor --collect=projection --project-dir=
        <project-dir>
        --config=
        <config>
Open the result in the
Intel Advisor
, view the interactive HTML report, or print it to the command line.
Continue to investigate the results and identify code regions to offload.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.