User Guide

  • 2020
  • 06/18/2020
  • Public Content
Contents

CPU/FPGA Interaction Analysis (Preview)

Use the CPU/FPGA Interaction analysis to assess the balance between CPU and FPGA in systems with FPGA hardware that run Data Parallel C++ (DPC++) or OpenCL™ applications.
This is a
PREVIEW FEATURE
. A preview feature may or may not appear in a future production release. It is available for your use in the hopes that you will provide feedback on its usefulness and help determine its future. Data collected with a preview feature is not guaranteed to be backward compatible with future releases. Please send your feedback to parallel.studio.support@intel.com or to intelsystemstudio@intel.com.
Use the CPU/FPGA Interaction analysis to assess FPGA performance of executed kernels, overall time for memory transfers between the CPU and FPGA, and wait time impact on CPU and FPGA workloads.
Intel® VTune™
Profiler
collects these FPGA device metrics:
  • Global Bandwidth
  • Stalls
  • Occupancy

Configure and Run Analysis

Follow this procedure to configure options for the CPU/FPGA Interaction analysis:
Prerequisites
:
  • To obtain device side information from the FPGA when profiling, make sure you specify the profile flag for the compile operation:
    To compile
    Use
    Specify
    OpenCL Applications
    Intel® FPGA SDK for OpenCL™ Offline Compiler
    -profile
    option
    DPC++ Applications
    Intel® oneAPI DPC++ Compiler
    -Xsprofile
    option
    For other compiler options (exclusive to OpenCL profiling), see the FPGA Programming Guide.
  1. Click the (standalone GUI)/ (Visual Studio IDE)
    Configure Analysis
    button on the
    Intel® VTune™
    Profiler
    toolbar.
    The
    Configure Analysis
    window opens.
  2. In the
    WHAT
    pane,
    • Specify the host executable in the
      Application
      bar.
    • If applicable, specify arguments for the host application as
      Application parameters
      .
  3. In the
    HOW
    pane, click the Browse button.
    • Select
      CPU/FPGA Interaction
      analysis type from the Platform Analysis group.
    • Enter the CPU sampling interval in milliseconds.
    • Specify if the collection should include CPU call stacks.
    • Specify a source for the FPGA profiling data:
      • OpenCL Profiling API
        - This source profiles only the host application.
      • AOCL Profiler
        - This source profiles the host application as well as the design on your FPGA.
    To generate the command line for this configuration, use the Command Line button.
  4. Click the Start button to run the analysis.

Import FPGA Data collected with Profiler Runtime Wrapper

If you collected FPGA profiling data with the Profiler Runtime Wrapper in the format of a
profile.json
file, you can also import it to the VTune Profiler project.
To speed up the loading of the collected data, copy the
profile.json
to an empty folder and import that folder instead of the entire compilation directory.
See the FPGA Optimization Guide for information on generating the profiling data with the Profiler Runtime Wrapper (oneAPI applications only).

View Data

The CPU/FPGA Interaction analysis results appear in the CPU/FPGA Interaction viewpoint. The viewpoint contains these windows:
  • The Summary window displays statistics on the overall application execution, identifying CPU time and processor utilization, and execution time for DPC++ or OpenCL kernels. Double click a kernel in the bottom-up view to see detailed performance data through the source view.
  • The Bottom-up window displays functions in the bottom-up tree, CPU time and CPU utilization per function. Click on the functions or kernels in this view to see the Source View.
  • The Platform window displays over-time metric and performance data for DPC++ or OpenCL kernels, memory transfers, CPU context switches, FPU utilization, and CPU threads with DPC++ or OpenCL kernels.

What's Next

Use the CPU/FPGA Interaction viewpoint to review the following:
  • FPGA Utilization: Look at the
    FPGA Top Compute Tasks
    on the
    Summary
    window for a list of kernels running on the FPGA. The
    Bottom-up
    window shows the Total and Average execution time for every kernel.
  • Memory Transfers: Look at the
    Data Transferred
    column on the
    Bottom-up
    window or the
    Computing Queue
    rows on the
    Platform
    window to view DPC++ or OpenCL kernels and memory transfers.
  • Workload Impact: The
    Context Switch Time
    metric on the
    Summary
    window shows how much time was spent in CPU context switches. Context switches can also be seen on the
    Platform
    tab as they occurred during application execution.

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804