• 2020
  • 10/23/2020
  • Public Content

Profiling an FPGA-driven DPC++ Application

Use this recipe to profile an FPGA-driven DPC++ (Data Parallel C++) application. The recipe features the AOCL Profiler integrated in the CPU/FPGA Interaction (preview) analysis type in Intel® VTune™ Profiler.


Here are the minimum hardware and software requirements for this performance recipe.
  • Application
    . This sample FPGA design is available in the repository for Intel® oneAPI DPC++ Compiler samples.
  • Compiler
    : To profile a DPC++ application, you need the
    compiler that is available with Intel® oneAPI toolkits (Beta).
  • Tools
    • For
      downloads and product support, visit
    • All the Cookbook recipes are scalable and can be applied to Intel VTune Amplifier 2018 and higher. Slight version-specific configuration changes are possible.
    • Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler starting with its version for Intel® oneAPI Base Toolkit (Beta). You can still use a standalone version of the VTune Profiler, or its versions integrated into Intel Parallel Studio XE or Intel System Studio.
  • Operating system
    : Linux* OS (Ubuntu* 18.04)
  • CPU
    : Intel server platform code-named Cascade Lake
  • FPGA
    : Intel® Programmable Acceleration Card (Intel® PAC) with Intel® Arria® 10 GX FPGA or Intel® Stratix 10 GX FPGA PAC board for DPC++ (with installable add-on)

Install and Configure the Toolkit

  1. Plug the Intel PAC card into the PCIe slot on the machine.
  2. Download and install Intel® oneAPI Base Toolkit (Beta) for Linux. Select all default options and either the online or offline installer.
  3. Unzip the FPGA add-on package and run
    . Select all default options.
  4. Set up the oneAPI environment.
    source <oneAPI-install-dir>/
  5. Install the FPGA board.
    aocl install
  6. Run the diagnose command to ensure that all diagnostics pass.
    aocl diagnose

Build the Sample Application

  1. Download code samples from the repository for Intel oneAPI DPC++ Compiler samples.
    git clone
  2. Open the
    sample folder.
    cd BaseKit-code-samples/FPGAExampleDesigns/crr
  3. Open the
  4. Locate the line of code that lists hardware flags. It should start with
  5. Add
    to the set of flags.
  6. Go back to the main directory for the sample. Create a new folder called
    and open it.
    mkdir build cd build
  7. Compile the sample.
    cmake .. make fpga
    This process can take several hours. Once it has finished, you should have an executable file called
You can now run
on FPGA hardware.

Run CPU/FPGA Interaction Analysis

  1. Launch VTune Profiler and click
    New Project
    from the Welcome page.
    Create a Project
    dialog box opens.
  2. Specify a project name, a location for your project, and click
    Create Project
    Configure Analysis
    window opens.
  3. In the
    pane, select
    Local Host
  4. In the
    pane, select
    Launch Application
    as the target.
    • In the
      field, specify the path to the
    • In the
      Application parameters
      field, enter
    Set up FPGA analysis
  5. In the
    pane, select
    CPU/FPGA Interaction (preview)
    from the
    Platform Analysis
  6. In the analysis settings, select
    AOCL Profiler
    for the
    FPGA profiling data source
    Set up FPGA analysis
  7. Click
    at the bottom to run the analysis.

Analyze Results

Once data collection completes, you can see the finalized results in the
CPU/FPGA Interaction
viewpoint. Start with the
window to view these details:
  • FPGA top compute tasks
  • Top tasks and hotspots for the CPU
Result summary for CPU/FPGA Interaction
Switch to the
window to see detailed information at the kernel level including:
  • Stalls
  • Occupancy
  • Data transfer size
  • Average bandwidth for transferred data
Bottom-up window
Use the timeline view to see these details about kernel instances:
  • Start/end times
  • Overtime stalls
  • Occupancy
  • Bandwidth metrics
Timeline view in CPU/FPGA Interaction
In the
window, right-click on a kernel and select
View Source
from context menu.
This opens the
Source View
, where you can see metrics for specific kernel source lines.
Source View

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804