Analyzing CPU and FPGA (Intel® Arria® 10 GX) Interaction

This recipe instructs you how to configure your platform to analyze an interaction of your CPU and FPGA, using Intel® Arria 10 GX FPGA as an example.

  1. Ingredients
  2. Configure the Intel® Arria® 10 GX FPGA and Intel® FPGA SDK for OpenCL™
  3. Build the Sample Application and Flash to the FPGA
  4. Run CPU/FPGA Interaction Analysis
  5. Interpret Results

Ingredients

This section lists the hardware and software tools used for the performance analysis scenario.

  • Application: Matrix Multiplication OpenCL™ application. The Matrix Multiplication sample application is available for download from the Intel® FPGA SDK for OpenCL™ website: https://www.intel.com/content/www/us/en/programmable/products/design-software/embedded-software-developers/opencl/developer-zone.html
  • Tools: Intel® FPGA SDK for OpenCL™, Intel® VTune™ Amplifier 2019 or later

    Note

    • For trial VTune Amplifier downloads and product support, visit https://software.intel.com/en-us/vtune.

    • All the Cookbook recipes are scalable and can be applied to VTune Amplifier 2018 and higher. Slight version-specific configuration changes are possible.

  • Operating System: CentOS* 7, Red Hat* Enterprise Linux 7 or later
  • CPU: Intel® server platform code named Skylake
  • FPGA: Intel® Arria® 10 GX

Configure the Intel® Arria® 10 GX FPGA and Intel® FPGA SDK for OpenCL™

  1. On your Intel Arria 10 GX FPGA, set up the DIP switches and connect the power and USB cables. Detailed instructions for these steps, and others, are available from https://www.intel.com/content/www/us/en/programmable/documentation/tgy1490191698959.html.

  2. Download Intel® FPGA SDK for OpenCL™ (includes CodeBuilder, Quartus Prime software and devices) from http://fpgasoftware.intel.com/opencl/.

  3. Run the setup_pro.sh file to install the SDK.

  4. Run source init_opencl.sh to set the appropriate environment variables.

  5. Run aocl version to verify the installation. The output should look similar to the following:

    aocl 17.1.0.240 (Intel(R) FPGA SDK for OpenCL(TM), Version 17.1.0 Build 240, Copyright (C) 2017 Intel Corporation)

  6. Run aocl install to install the FPGA board.

  7. Run aocl diagnose to verify the hardware installation. The output should look similar to the following:

    Device Name:
    acl0
    
    Package Pat:
    /home/tce/intelFPGA_pro/17.1/hld/board/a10_ref
    
    Vendor: Intel(R) Corporation
    
    Phys Dev Name  Status   Information
    
    acla10_ref0   Passed   Arria 10 Reference Platform (acla10_ref0)
                            PCIe dev_id = 2494, bus:slot.func = 44:00.00, Gen3 x4
                            FPGA temperature = 44.3555 degrees C.
    
    DIAGNOSTIC_PASSED
    

Build the Sample Application and Flash to the FPGA

  1. Run make with the default makefile to build the host executable. The executable output filename is host.

  2. Build the binary for the FPGA using the following command:

    aoc -v -board=a10gx device/matrix_mult.cl -o bin/ matrix_mult.aocx
  3. Set up the USB driver to flash.

    1. Run the following command:

      sudo vim /etc/udev/rules.d/51-usbblaster.rules
    2. Add the following lines:

      # usb blaster
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6001", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6002", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6003", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6010", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      SUBSYSTEM=="usb", ENV{DEVTYPE}=="usb_device", ATTRS{idVendor}=="09fb", ATTRS{idProduct}=="6810", MODE="0666", NAME="bus/usb/$env{BUSNUM}/$env{DEVNUM}", RUN+="/bin/chmod 0666 %c"
      
  4. Lower the JTAG clock speed to 6 MHz using the following command:

    jtagconfig --setparam 1 JtagClock 6M
  5. Flash the binary to the FPGA using the following command:

    aocl flash acl0 ./bin/matrix_mult.aocx
  6. Reboot the host system with the FPGA.

Run CPU/FPGA Interaction Analysis

  1. Launch the Intel® VTune™ Amplifier. For example:

    /opt/intel/vtune_amplifier_2019/bin64/amplxe-gui
  2. Create a project for your analysis, for example: hello_world_opencl.

  3. Click Configure Analysis to start a new analysis.

  4. Set up the CPU/FPGA Interaction analysis.

    Intel VTune Amplifier Configure Analysis window showing matrix multiply file path

    1. In the WHERE pane, select Local Host.

    2. In the WHAT pane, select Launch Application and browse to the hello world application. Typically the application can be found under <sample app>/bin/host.

    3. In the HOW pane, select CPU/FPGA Interaction from the available analysis types.

  5. Click Start to begin the analysis.

Interpret Results

After data collection completes, the results are finalized and shown in the CPU/FPGA Interaction viewpoint. Start with the Summary tab to view the FPGA top compute tasks and well as the top tasks and hotspots for the CPU.

Intel VTune Ampliifer Summary window showing CPU/FPGA Interaction viewpoint with Top Hotspots and FPGA Top Compute lists

Switch to the Bottom-up tab to review the work size of a compute task and data transfer throughput. Use the timeline pane to review the FPGA utilization for compute and transfer.

Intel VTune Amplifier Bottom-up tab of CPU/FPGA Interaction viewpoint showing timeline of FPGA utilization

Use the Platform tab to check the computing queue for the FPGA and host application. You can also find the start time and duration of each transfer and synchronization.

Intel VTune Amplifier Platform tab of CPU/FPGA Interaction viewpoint showing computing queue, tread, and FPGA utilization timelines

For more complete information about compiler optimizations, see our Optimization Notice.