How to use Intel® VTune™ Amplifier for Systems

Background

 Intel® System Studio is Intel’s new embedded software tool suite.  This tool suite includes Intel® VTune Amplifier for Systems. This article will explain the steps you need to follow to run VTune Amplifier for Systems on an embedded platform.

Overview

The embedded OS we will be focused on is Yocto Project* version 1.2. This platform supports many Intel BSP’s (board support packages) and as well as a software-based emulator.  Here are the steps we will take to run our collection:

  1. Setting  up a Yocto Project* 1.2 environment
    1. Setting up your Linux* host
    2. Setting up a cross compilation environment
    3. Setting up git meta-intel
    4. Setup a full build of the Yocto Project* for your BSP
    5. Building a Yocto Project* kernel
  2. Installing VTune Amplifier 2013 for Systems
    1. Cross build the sampling driver (sep)
    2. Load sep onto your device.
  3. Cross compiling the tachyon sample application
    1. Build the application
    2. Copy to your Yocto Project*  target
  4. Running a sep collection on your target.
  5. On your Linux* host view the results in the VTune  Amplifier 2013 for Systems 

Note: steps  1 and 2 are one time steps. Once you have the sampling driver built and loaded on your system you should be enabled to collect performance data until the system reboots.

Setting up a Yocto Project* 1.2 environment

  1. Download the pre-built tool chain, which includes the runqemu script and support files
    from: http://downloads.yoctoproject.org/releases/yocto/yocto-1.2/toolchain/
    1. The following tool chain tarball is for a 32-bit development host system and a 32-bit target architecture: poky-eglibc-i686-i586-toolchain-gmae-1.2.tar.bz2
    2. You need to install this tar ball on your Linux* host in the root “/” directory. This will create an installation area “/opt/poky/1.2”
  2. Set up your Linux* host system:
    1. For my Ubuntu x64 12.04 system I ran the following command to setup my system

      $ sudo apt-get install sed wget cvs subversion git-core coreutils \

           unzip texi2html texinfo libsdl1.2-dev docbook-utils gawk \

           python-pysqlite2 diffstat help2man make gcc build-essential \

      g++ desktop-file-utils chrpath libgl1-mesa-dev libglu1-mesa-dev \

       mercurial autoconf automake groff

    2. See the Yocto Project* getting started guide for more information on the setup required for various Linux* distros: http://www.yoctoproject.org/docs/1.0/yocto-quick-start/yocto-project-qs.html
    3. Clone and checkout the meta-intel package               
    4. git clone git://git.yoctoproject.org/meta-intel/         
      1. cd meta-intel
      2. git checkout denzil
  3. Build the Yocto Project* kernel
    1. Download the latest stable Yocto Project* build system.
      1. wget http://downloads.yoctoproject.org/releases/yocto/yocto-1.2.1/poky-denzil-7.0.1.tar.bz2
    2. tar xjf poky-denzil-7.0.1.tar.bz2
    3. source poky-denzil-7.0.1/oe-init-build-env poky-denzil-7.0.1-build
    4. Edit poky-denzil-7.0.1-build/conf/local.conf
      1. Tailor MACHINE for the BSP you want to build.
      2. In my case I am building fri2-noemgd
    5. Edit poky-denzil-7.0.1-build/conf/bblayers.conf
      1. Specify the meta-intel you checked out.
      2. Specify the BSP meta directory. (meta-intel/meta-fri2)
    6. Build the Yocto Project* kernel
      1. bitbake core-image-sato
      2. This will create a kernel that is sufficient to build and run sep.
  1. Install VTune Amplfier 2013 for Systems on your Linux* host. You will need to build the sampling driver and load it on your target in order to collect performance data
    1. cd $VTUNE_INSTALL/sepdk
    2. For example, if you are building a Yocto Project* 1.2 build of the fri2-noemgd BSP, then your build command would be similar to the following:

./build-driver -ni --c-compiler=i586-poky-linux-gcc \

--kernel-src-dir=~/yocto/poky-denzil-7.0/build/tmp/work/fri2_noemgd-poky-linux/linuxyocto3.2.11+git1+5b4c9dc78b5ae607173cc3ddab9bce1b5f78129b_1+76dc683eccc4680729a76b9d2fd425ba540a483-r1/linux-fri2-noemgd-standard-build \

--kernel-version=3.2.18-yocto-standard \

--make-args="PLATFORM=x32 ARITY=smp" \

--install-dir=../prebuilt

Note:  To find the kernel-version, see $kernel-src-dir/include/linux/utsrelease

  1. Load sampling driver on your target
    1. scp  -r $VTUNE_INSTAL/sepdk root@target_ip:/home/root
    2. Login to your target
    3. cd /home/root/sep/sepdk
    4. ./insmod-sep3 -re

Cross compiling the tachyon sample code

  1. The tachyon sample code is provided as part of the Amplifier XE 2013 release.
    1. On your Linux* host
    2. cd ~/yocto
    3. Untar tachyon : $VTUNE_INSTAL/samples/en/C++/tachon_vtune_amp_xe.tgz
    4. You will need to modify the tachyon sample as follows
      1. In the top level Makefile:  Comment out the line containing CXX.
      2. In the lower level Makefile.gmake ('tachyon/common/gui/Makefile.gmake') Add the following lines:

UI = x
EXE = $(NAME)$(SUFFIX)
CXXFLAGS += -I$(OECORE_TARGET_SYSROOT)/X11
LIBS += -lpthread -lX11
#LIBS += -lXext
CXXFLAGS += -DX_NOSHMEM

source /opt/poky/1.2/environment-setup-i586-poky-linux 

make

Copy the tachyon binary and the libtbb.so file to your Yocto target.

scp tachyon_find_hotspots libtbb.so root@target_ip:/home/root

 Run Intel® VTune™ Amplifier 2013 for Systems on the tachyon sample code

  1. Login to your Yocto target
    1. User root no password
  2. Setup the sep environment
    1. cd /home/root/sep/bin
    2. source ./setup_sep_runtime_env.sh
  3. cd /home/root
  4. Setup the library path   
    1. export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
  5. Run an sep collection for hotspots:
    1. sep –start –out  hotspot_data –app ./tachyon_find_hotspots
    2. Your application will run and produce a hotspot_data.tb6 file
    3. This tb6 file can be viewed inside the VTune Amplifier 2013 for Systems 
  6. Run another sep collection specifying the events needed for an Intel® Atom™ General Exploration  analysis.

 sep -start -em -ec "BR_INST_RETIRED.MISPRED.PS,BUS_LOCK_CLOCKS.ALL_AGENTS,CPU_CLK_UNHALTED.CORE,CPU_CLK_UNHALTED.REF,CYCLES_DIV_BUSY,DATA_TLB_MISSES.DTLB_MISS,EXT_SNOOP.ALL_AGENTS.HITM,FP_ASSIST.S,ICACHE.MISSES,INST_RETIRED.ANY,ITLB.MISSES,MACHINE_CLEARS.SMC,MEM_LOAD_RETIRED.L2_HIT.PS,MEM_LOAD_RETIRED.L2_MISS.PS,MISALIGN_MEM_REF.LD_SPLIT.AR,MISALIGN_MEM_REF.ST_SPLIT.AR,PAGE_WALKS.CYCLES,REISSUE.OVERLAP_STORE.AR,SIMD_ASSIST,UOPS.MS_CYCLES,UOPS_RETIRED.ANY" –out general_exp -app ./tachyon_find_hotspots

 On your Linux* host: Import the VTune™ Amplifier 2013 for Systems results

  1. Copy the tb6 files created above
    1. scp root@target_ip:/home/root/hotspot_data.tb6
    2. scp root@target_ip:/home/root/general_exp.tb6
  2. To view these results in VTune Amplifier 2013 for Systems 
    1. /opt/intel/system_studio_2013.0.xxx/vtune_amplifier_2013_for_systems/amplxe-vars.sh
    2. Start the VTune Amplifier 2013 for Systems
      1. amplxe-gui
    3. Create a Project
      1. File->New Project
        1. This will bring up the project properties dialog
          1. Click on the Search Directories tab and Specify “Search Directories” for “All Files"
            1. Add the directory that you built tachyon
            2. Click “OK”
    4. Import hotspot results
    5. File->Import Result
      1. Specify Import single result
      2. Browse to the hotspot_data.tb6 file

Click “Import”
3.       You should see a result similar to to the following:

4. You can also view the events you have collected at the source and assembly code level.

Para obtener información más completa sobre las optimizaciones del compilador, consulte nuestro Aviso de optimización.