Cookbook

  • 2020
  • 06/18/2020
  • Public Content

Profiling an FPGA-driven DPC++ Application

Use this recipe to profile an FPGA-driven DPC++ (Data Parallel C++) application. The recipe features the AOCL Profiler integrated in the CPU/FPGA Interaction (preview) analysis type in Intel® VTune™ Profiler.

Ingredients

Here are the minimum hardware and software requirements for this performance recipe.
  • Application
    :
    crr
    . This sample FPGA design is available in the repository for Intel® oneAPI DPC++ Compiler samples.
  • Compiler
    : To profile a DPC++ application, you need the
    dpcpp
    compiler that is available with Intel® oneAPI toolkits (Beta).
  • Tools
    :
    • For
      VTune
      Profiler
      downloads and product support, visit https://software.intel.com/en-us/vtune.
    • All the Cookbook recipes are scalable and can be applied to Intel VTune Amplifier 2018 and higher. Slight version-specific configuration changes are possible.
    • Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler starting with its version for Intel® oneAPI Base Toolkit (Beta). You can still use a standalone version of the VTune Profiler, or its versions integrated into Intel Parallel Studio XE or Intel System Studio.
  • Operating system
    : Linux* OS (Ubuntu* 18.04)
  • CPU
    : Intel server platform code-named Cascade Lake
  • FPGA
    : Intel® Programmable Acceleration Card (Intel® PAC) with Intel® Arria® 10 GX FPGA or Intel® Stratix 10 GX FPGA PAC board for DPC++ (with installable add-on)

Install and Configure the Toolkit

  1. Plug the Intel PAC card into the PCIe slot on the machine.
  2. Download and install Intel® oneAPI Base Toolkit (Beta) for Linux. Select all default options and either the online or offline installer.
  3. Unzip the FPGA add-on package and run
    setup.sh
    . Select all default options.
  4. Set up the oneAPI environment.
    source <oneAPI-install-dir>/setvars.sh
  5. Install the FPGA board.
    aocl install
  6. Run the diagnose command to ensure that all diagnostics pass.
    aocl diagnose

Build the Sample Application

  1. Download code samples from the repository for Intel oneAPI DPC++ Compiler samples.
    git clone https://github.com/intel/BaseKit-code-samples.git
  2. Open the
    crr
    sample folder.
    cd BaseKit-code-samples/FPGAExampleDesigns/crr
  3. Open the
    src/CMakeLists.txt
    file.
  4. Locate the line of code that lists hardware flags. It should start with
    set(HARDWARE_LINK_FLAGS
    .
  5. Add
    -Xsprofile
    to the set of flags.
  6. Go back to the main directory for the sample. Create a new folder called
    build
    and open it.
    mkdir build cd build
  7. Compile the sample.
    cmake .. make fpga
    This process can take several hours. Once it has finished, you should have an executable file called
    crr.fpga
    .
You can now run
crr.fpga
on FPGA hardware.

Run CPU/FPGA Interaction Analysis

  1. Launch VTune Profiler and click
    New Project
    from the Welcome page.
    The
    Create a Project
    dialog box opens.
  2. Specify a project name, a location for your project, and click
    Create Project
    .
    The
    Configure Analysis
    window opens.
  3. In the
    WHERE
    pane, select
    Local Host
    .
  4. In the
    WHAT
    pane, select
    Launch Application
    as the target.
    • In the
      Application
      field, specify the path to the
      crr.fpga
      executable.
    • In the
      Application parameters
      field, enter
      ordered_inputs.csv
      .
    Set up FPGA analysis
  5. In the
    HOW
    pane, select
    CPU/FPGA Interaction (preview)
    from the
    Platform Analysis
    group.
  6. In the analysis settings, select
    AOCL Profiler
    for the
    FPGA profiling data source
    .
    Set up FPGA analysis
  7. Click
    Start
    at the bottom to run the analysis.

Analyze Results

Once data collection completes, you can see the finalized results in the
CPU/FPGA Interaction
viewpoint. Start with the
Summary
window to view these details:
  • FPGA top compute tasks
  • Top tasks and hotspots for the CPU
Result summary for CPU/FPGA Interaction
Switch to the
Bottom-up
window to see detailed information at the kernel level including:
  • Stalls
  • Occupancy
  • Data transfer size
  • Average bandwidth for transferred data
Bottom-up window
Use the timeline view to see these details about kernel instances:
  • Start/end times
  • Overtime stalls
  • Occupancy
  • Bandwidth metrics
Timeline view in CPU/FPGA Interaction
In the
Bottom-up
window, right-click on a kernel and select
View Source
from context menu.
This opens the
Source View
, where you can see metrics for specific kernel source lines.
Source View

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804