Cookbook

  • 2020
  • 06/18/2020
  • Public Content

Analyzing CPU and FPGA (Intel® Arria® 10 GX) Interaction

This recipe instructs you how to configure your platform to analyze an interaction of your CPU and FPGA, using Intel® Arria 10 GX FPGA as an example.

Ingredients

This section lists the hardware and software tools used for the performance analysis scenario.
  • Application
    : Matrix Multiplication OpenCL™ application. The Matrix Multiplication sample application is available for download from the Intel® FPGA SDK for OpenCL™ website
  • Tools
    : Intel® FPGA SDK for OpenCL™, Intel® VTune™ Amplifier 2019 or higher
    • For
      VTune
      Profiler
      downloads and product support, visit https://software.intel.com/en-us/vtune .
    • All the Cookbook recipes are scalable and can be applied to Intel VTune Amplifier 2018 and higher. Slight version-specific configuration changes are possible.
    • Intel® VTune™ Amplifier has been renamed to Intel® VTune™ Profiler starting with its version for Intel® oneAPI Base Toolkit (Beta). You can still use a standalone version of the VTune Profiler, or its versions integrated into Intel Parallel Studio XE or Intel System Studio.
  • Operating System
    : CentOS* 7, Red Hat* Enterprise Linux 7 or higher
  • CPU
    : Intel® server platform code named Skylake
  • FPGA
    : Intel® Arria® 10 GX

Configure the Intel® Arria® 10 GX FPGA and Intel® FPGA SDK for OpenCL™

  1. On your Intel Arria 10 GX FPGA, set up the DIP switches and connect the power and USB cables. See detailed instructions .
  2. Download
    Intel® FPGA SDK for OpenCL™ (includes CodeBuilder, Quartus Prime software and devices)
    from http://fpgasoftware.intel.com/opencl/.
  3. Run the
    setup_pro.sh
    file to install the SDK.
  4. Run
    source init_opencl.sh
    to set the appropriate environment variables.
  5. Run
    aocl version
    to verify the installation. The output should look similar to the following:
    aocl 17.1.0.240 (Intel(R) FPGA SDK for OpenCL(TM), Version 17.1.0 Build 240, Copyright (C) 2017 Intel Corporation)
  6. Run
    aocl install
    to install the FPGA board.
  7. Run
    aocl diagnose
    to verify the hardware installation. The output should look similar to the following:
    Device Name: acl0 Package Pat: /home/tce/intelFPGA_pro/17.1/hld/board/a10_ref Vendor: Intel(R) Corporation Phys Dev Name Status Information acla10_ref0 Passed Arria 10 Reference Platform (acla10_ref0) PCIe dev_id = 2494, bus:slot.func = 44:00.00, Gen3 x4 FPGA temperature = 44.3555 degrees C. DIAGNOSTIC_PASSED

Build the Sample Application and Flash to the FPGA

  1. Run
    make
    with the default
    makefile
    to build the host executable. The executable output filename is
    host
    .
  2. Build the binary for the FPGA using the following command:
    aoc -v -board=a10gx device/matrix_mult.cl -o bin/ matrix_mult.aocx
  3. Set up the USB driver to flash.
    1. Run the following command:
      sudo vim /etc/udev/rules.d/51-usbblaster.rules
    2. Add the following lines:
      # usb blaster SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6001", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6002", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6003", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6010", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c" SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6810", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
  4. Lower the JTAG clock speed to 6 MHz using the following command:
    jtagconfig --setparam 1 JtagClock 6M
  5. Flash the binary to the FPGA using the following command:
    aocl flash acl0 ./bin/matrix_mult.aocx
  6. Reboot the host system with the FPGA.

Run CPU/FPGA Interaction Analysis

  1. Launch the VTune Amplifier. For example:
    /opt/intel/vtune_amplifier_2019/bin64/amplxe-gui
  2. Create a project for your analysis, for example:
    hello_world_opencl
    .
  3. Click
    Configure Analysis
    to start a new analysis.
  4. Set up the
    CPU/FPGA Interaction
    analysis.
    Configure Analysis window showing matrix multiply file path
    1. In the
      WHERE
      pane, select
      Local Host
      .
    2. In the
      WHAT
      pane, select
      Launch Application
      and browse to the
      hello world
      application. Typically the application can be found under
      <sample app>
      /bin/host
      .
    3. In the
      HOW
      pane, select
      CPU/FPGA Interaction
      from the available analysis types.
  5. Click
    Start
    to begin the analysis.

Interpret Results

After data collection completes, the results are finalized and shown in the
CPU/FPGA Interaction
viewpoint. Start with the
Summary
tab to view the FPGA top compute tasks and well as the top tasks and hotspots for the CPU.
Summary window showing CPU/FPGA Interaction viewpoint with Top Hotspots and FPGA Top Compute lists
Switch to the
Bottom-up
tab to review the work size of a compute task and data transfer throughput. Use the timeline pane to review the FPGA utilization for compute and transfer.
Bottom-up tab of CPU/FPGA Interaction viewpoint showing timeline of FPGA utilization
Use the
Platform
tab to check the computing queue for the FPGA and host application. You can also find the start time and duration of each transfer and synchronization.
Platform tab of CPU/FPGA Interaction viewpoint showing computing queue, tread, and FPGA utilization timelines

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804