User Guide

Contents

What's New

This topic lists new high-level features and improvements in
Intel® Advisor
. For a full list of new features, see
Intel Advisor
Release Notes
.

Intel® Advisor
2021.3

  • Offload Modeling
    :
    • GPU-to-GPU performance modeling (
      feature preview
      )
      The
      Offload Modeling
      perspective introduces a new
      GPU-to-GPU performance model
      . With this model, you can analyze your Data Parallel C++ (DPC++), OpenMP* target, or OpenCL™ application running on a graphics processing unit (GPU) and model its performance on a different GPU platform. Use this workflow to understand how you can improve your application performance and check if you can get a higher speedup if you offload the application to a different GPU platform.
      The GPU-to-GPU performance modeling is based on the following:
      • Compute throughput model estimates time by compute throughput based on GPU kernel instruction mix with respect to GPU compute throughput capabilities and workload decomposition.
      • Memory throughput model estimates cache and memory traffic based on a target GPU configuration. Based on this data, the model also estimates time by cache/memory bandwidth.
      • Memory latency model estimates latency of read memory instructions based on the number of the instructions in the kernel.
      • Atomic throughput model estimates time by atomic throughput based on hardware counter of atomic accesses on the baseline device.
      • Data transfer model estimated offload overhead for transferring data between host and GPU devices.
    • New recommendations for effectively offloading your code from CPU to GPU
      The
      Offload Modeling
      perspective introduces recommendations for offloading code regions to a GPU, performance bottleneck analytics, and actionable recommendations to resolve them when you offload your code from a CPU to a GPU.
      The recommendations are reported in a new
      Recommendations
      pane in the
      Accelerated Regions
      report
      and include the following:
      • Recommendations for
        offloading code regions
        with the modeled performance summary
      • Recommendation for
        DPC++/OpenMP reduction pattern optimization
        for code regions recommended for offloading
      • Recommendation for
        algorithmic constraints optimization
        for code regions recommended for offloading
      • Recommendations for code regions
        not recommended for offloading
        with reasons why the region is not expected to have high speedup, suggesting you refactoring the code
  • GPU Roofline:
    • Expandable per-kernel instances for the GPU Roofline
      The
      GPU Roofline Insights
      perspective introduces a new kernel visualization feature that breaks down a kernel into instances grouped by workload parameters (global and local sizes).
      If the kernel was executed with different workloads or work groups, you can compare performance characteristics for different executions.
      The feature is shown in the following panes of the GPU Roofline report:
      • In the
        GPU
        grid, you can expand a source kernel to see its sub-rows. Each sub-row is group of kernel instances executed with the same kernel properties.
      • In the
        GPU Roofline
        chart, if the kernel was executed with different properties, it has a
        +
        plus icon near the kernel dot. The parent dot corresponds to the source compute task. Click the plus icon to expand it and see kernel instances that visualize performance dependency on workload parameters.
      • Select the source compute task or the instance task from the grid or from the chart to see the details metrics in the
        Details
        pane.
    • Potential integer operations (INTOP) extended with logical operations
      When measuring the number of integer operations for the GPU Roofline,
      Intel Advisor
      counts logical operations, such as AND, OR, XOR, as potential integer operations. It reflects the actual performance of the profiled application on the GPU Roofline chart showing all the hotspots with logical operations closer to a performance boundary.
      For more information about operations counted for the GPU Roofline, see Examine Bottlenecks on GPU Roofline Chart.
    • New GPU Roofline interpretation hints in the kernel details
      Intel Advisor
      provides hints for memory-bound codes to increase application performance and remove memory subsystems bottlenecks.
      See Examine Kernel Details for details.
    • Memory metric grid improvements for the GPU Roofline
      In the GPU Roofline report, memory columns of the
      GPU
      grid now provide better and clearer view of memory metrics:
      • Memory metrics are grouped by memory subsystem.
      • A new CARM column group includes CARM traffic and L3 cache line utilization metrics.
      See Accelerator Metrics for details.
  • Documentation
    :

Intel® Advisor
2021.2

  • Usability:
    • New Source view for the
      Offload Modeling
      and
      GPU Roofline Insights
      perspectives
      The
      Offload Modeling
      and GPU Roofline Insights reports now include a full-screen
      Source
      view with syntax highlighting in a separate tab. Use it to explore application source code and related metrics.
      For the
      GPU Roofline Insights
      perspective, the
      Source
      view also includes the
      Assembler
      view, which you can view side-by-side with the source.
      To switch to the
      Source
      view, double-click a kernel from the main report.
    • New Details pane with in-depth GPU kernel analytics for the
      GPU Roofline Insights
      perspective
      The GPU Roofline Regions report now includes a new
      Details
      pane
      , which provides in-depth kernel execution metrics for a single kernel, such as execution time on GPU, work size and SIMD width, a single-kernel Roofline highlighting the distance to the nearest roof (performance limit), floating-point and integer operation summary, memory and cache bandwidth, EU occupancy, and instruction mix summary.
  • Offload Modeling
    :
    • Data transfers estimations with data reuse on GPU
      The
      Offload Modeling
      perspective introduces a new data reuse analysis, which provides more accurate estimations of data transfer costs.
      Data reuse analysis detects groups of regions that can reuse the same memory objects on GPU. It also shows which kernels can benefit from data reuse and how it impacts application performance. This can decrease the data transfer tax because when two or more kernels use the same memory object, it needs to be transferred only once.
      You can enable the data reuse analysis for Performance Modeling from
      Intel Advisor
      GUI
      or from command line interface. With the analysis enabled, the estimated data transfer metrics are reported with and without data reuse. See Accelerator Metrics for details.
  • Documentation:

Intel® Advisor
2021.1

  • Data Parallel C++ (DPC++):
    • Implemented support for Data Parallel C++ (DPC++) code performance profiling on CPU and GPU targets.
    • Implemented support for oneAPI Level Zero specification for DPC++ applications.
  • Usability:
    • Introduced a new and improved
      Intel Advisor
      user interface (UI) that includes:
      • New look-and-feel for multiple tabs and panes, for example,
        Workflow
        pane and
        Toolbars
      • Offload Modeling
        and
        GPU Roofline
        workflows integrated in GUI
      • New notion of
        perspective
        , which is a complete analysis workflow that you can customize to manage accuracy and overhead trade-off. Each perspective collects performance data, but processes and presents it differently so that you could look at it from different points of view depending on your goal.
        Intel Advisor
        includes
        Offload Modeling
        ,
        GPU Roofline Insights
        ,
        Vectorization and Code Insights
        ,
        CPU / Memory Roofline Insights
        , and
        Threading
        perspectives.
      To switch back to the old UI, set the
      ADVISOR_EXPERIMANTAL=advixe_gui
      .
    • Renamed executables and environment scripts:
      • advixe-cl
        is renamed to
        advisor
        .
      • advixe-gui
        is renamed to
        advisor-gui
        .
      • advixe-python
        is renamed to
        advisor-python
        .
      • advixe-vars.[c]sh
        and
        advixe-vars.bat
        are renamed to
        advisor-vars.[c]sh
        and
        advisor-vars.bat
        respectively.
      See the Command Line Interface for details and sample command lines.
      The previous command line interface and executables are supported for backward compatibility.
  • Offload Modeling:
    • Introduced the Offload Modeling perspective (previously known as Offload Advisor) that you can use to prepare your code for efficient GPU offload even before you have a hardware. Identify parts of code can be efficiently offloaded to a target device, estimate potential speedup, and locate bottlenecks.
    • Introduced data transfer analysis as an addition to the Offload Modeling perspective. The analysis reports
      data transfer costs
      estimated for offloading to a target device, estimated
      amount of memory
      your application uses per memory level, and
      hints
      for data transfer optimizations.
    • Introduced strategies to manage kernel invocation taxes (or kernel launch taxes) when modeling performance: do not hide invocation taxes, hide all invocation taxes except the first one, hide a part of invocation taxes. For more information, see Manage Invocation Taxes.
    • Added support for modeling application performance for the Intel® Iris® Xe MAX graphics.
  • Roofline:
    • Introduced Memory-Level Roofline feature (previously known as Integrated Roofline, tech preview feature). Memory-Level Roofline collects metrics for all memory levels and allows you to identify memory bottlenecks at different cache levels (L1, L2, L3 or DRAM).
    • Added a limiting memory level roof to the Roofline guidance and recommendations, which improves recommendation accuracy.
    • Added a single-kernel Roofline guidance for all memory levels with dots for multiple levels of a memory subsystem and limiting roof highlighting to the
      Code Analytics
      pane.
    • Introduced a GPU Roofline Insights perspective. GPU Roofline visualizes actual performance of GPU kernels against hardware-imposed performance limitations. Use it to identify the main limiting factor of your application performance and get recommendations for effective memory vs. compute optimization. GPU Roofline report supports float and integer data types and reports metrics for all memory levels.
    • Added support for profiling GPU workloads that run on the Intel® Iris® X
      e
      MAX graphics and building GPU Roofline for them.
  • Flow Graph Analyzer:
  • Documentation:
    • Introduced a PDF version of the
      Intel Advisor
      User Guide. Click
      Download as PDF
      at the top of this page to use the PDF version.
    • Introduced a new user guide structure that focuses on the new UI and reflects the usage flow to improve usability.
Documentation for older versions of
Intel® Advisor
is available for download only. For a list of available documentation downloads by product version, see these pages:

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.