How to use Intel® VTune™ Amplifier XE 2013 on Tool suite code-named Emberson

How to use Intel® VTune™ Amplifier XE 2013 on Tool suite code-named Emberson

Background

Emberson is the code name for the Intel’s new embedded software tool suite.  This tool suite includes Intel® VTune Amplifier XE 2013. This article will explain the steps you need to follow to run VTune Amplifier XE 2013 on an embedded platform.

Overview

The embedded OS we will be focused on is Yocto Project* version 1.2. This platform supports many Intel BSP’s (board support packages) and as well as a software-based emulator.  Here are the steps we will take to run our collection:

  1.  Setting  up a Yocto Project* 1.2 environment
    1. Setting up your Linux* host
    2. Setting up a cross compilation environment
    3.  Setting up git meta-intel
    4.  Setup a full build of the Yocto Project* for your BSP
    5. Building a Yocto Project* kernel
  2. Installing VTune Amplifier XE 2013
    1.  Cross build the sampling driver (sep)
    2. Load sep onto your device.
  3. Cross compiling the tachyon sample application
    1.  Build the application
    2. Copy to your Yocto Project*  target
  4. Running a sep collection on your target.
  5.  On your Linux* host view the results in the VTune  Amplifier XE 2013

Note: steps  1 and 2 are one time steps. Once you have the sampling driver built and loaded on your system you should be enabled to collect performance data until the system reboots.

Setting up a Yocto Project* 1.2 environment

  1. Download the pre-built tool chain, which includes the runqemu script and support files
    from:
    http://downloads.yoctoproject.org/releases/yocto/yocto-1.2/toolchain/
    1. The following tool chain tarball is for a 32-bit development host system and a 32-bit target architecture: poky-eglibc-i686-i586-toolchain-gmae-1.2.tar.bz2
    2. You need to install this tar ball on your Linux* host in the root “/” directory. This will create an installation area “/opt/poky/1.2”
  2. Set up your Linux* host system:
    1.   For my Ubuntu x64 12.04 system I ran the following command to setup my system

      $ sudo apt-get install sed wget cvs subversion git-core coreutils \

           unzip texi2html texinfo libsdl1.2-dev docbook-utils gawk \

           python-pysqlite2 diffstat help2man make gcc build-essential \

      g++ desktop-file-utils chrpath libgl1-mesa-dev libglu1-mesa-dev \

       mercurial autoconf automake groff

    2. See the Yocto Project* getting started guide for more information on the setup required for various Linux* distros: http://www.yoctoproject.org/docs/1.0/yocto-quick-start/yocto-project-qs.html
    3. Clone and checkout the meta-intel package               
    4. git clone git://git.yoctoproject.org/meta-intel/         
      1. cd meta-intel
      2. git checkout denzil
  3. Build the Yocto Project* kernel
    1. Download the latest stable Yocto Project* build system.
      1.  wget http://downloads.yoctoproject.org/releases/yocto/yocto-1.2.1/poky-denzil-7.0.1.tar.bz
    2.  tar xjf poky-denzil-7.0.1.tar.bz
    3. source poky-denzil-7.0.1/oe-init-build-env poky-denzil-7.0.1-build
    4. Edit poky-denzil-7.0.1/build/conf/local.conf
      1. Tailor MACHINE for the BSP you want to build.
      2. In my case I am building fri2-noemgd
    5. Edit poky-denzil-7.0.1/build/conf/bblayers.conf
      1. Specify the meta-intel you checked out.
      2.  Specify the BSP meta directory. (meta-intel/meta-fri2)
    6. Build the Yocto Project* kernel
      1. bitbake core-image-sato
      2.  This will create a kernel that is sufficient to build and run sep.

Install VTune Amplifier  XE 2013

  1. Install  VTune  Amplfier XE 2013 on your Linux* host. VTune™ Amplifier XE is a part of the Emberson suite of tools.
  2. You will need to build the sampling driver and load it on your target in order to collect performance data
    1. cd $SEP_LOCATION/sepdk
    2.   For example, if you are building a Yocto Project* 1.2 build of the fri2-noemgd BSPhen your build command would be similar to the following:

 

./build-driver -ni --c-compiler=i586-poky-linux-gcc \

                   --kernel-src-dir=~/yocto/poky-denzil-7.0/build/tmp/work/fri2_noemgd-poky-linux/linuxyocto3.2.11+git1+5b4c9dc78b5ae607173cc3ddab9bce1b5f78129b_1+76dc683eccc4680729a76b9d2fd425ba540a483-r1/linux-fri2-noemgd-standard-build \

                   --kernel-version=3.2.18-yocto-standard \

                   --make-args="PLATFORM=x32 ARITY=smp" \

                   --install-dir=../prebuilt

 

Note:  To find the kernel-version, see $kernel-src-dir/include/linux/utsrelease

  1. Load sampling driver on your target
    1. scp  -r $SEP_LOCATION root@target_ip:/home/root
    2. Login to your target
    3. cd /home/root/sep/sepdk
    4. ./insmod-sep3

Cross compiling the tachyon sample code

  1. The tachyon sample code is provided as part of the Amplifier XE 2013 release.
    1. On your Linux* host
    2. Cd ~/yocto
    3. Untar tachyon : tar xvzf /opt/intel/vtune_amplifier_xe_2013/samples/en/C++/tachon_vtune_amp_xe.tgz
    4. You will need to modify the tachyon sample as follows
      1. In the top level Makefile:  Comment out the line containing CXX.
      2. In the lower level Makefile.gmake ('tachyon/common/gui/Makefile.gmake') Add the following lines:

UI = x
EXE = $(NAME)$(SUFFIX)
CXXFLAGS += -I$(OECORE_TARGET_SYSROOT)/X11
LIBS += -lpthread -lX11
#LIBS += -lXext
CXXFLAGS += -DX_NOSHMEM

 

 

source /opt/poky/1.2/environment-setup-i586-poky-linux 

make

Copy the tachyon binary and the libtbb.so file to your Yocto target.

scp tachyon_find_hotspots libtbb.so root@target_ip:/home/root

 

Run Intel® VTune™ Amplifier XE 2013 on the tachyon sample code

  1. Login to your Yocto target
    1. User root no password
  2. Setup the sep environment
    1. cd /home/root/sep/bin
    2. source ./setup_sep_runtime_env.sh
  3.  cd /home/root
  4. Setup the library path   
    1. export LD_LIBRARY_PATH=.:$LD_LIBRARY_PATH
  5.  Run an sep collection for hotspots:
    1.  sep –start –out  hotspot_data –app ./tachyon_find_hotspots
    2. Your application will run and produce a hotspot_data.tb6 file
    3. This tb6 file can be viewed inside the VTune Amplifier XE 2013 
  6.  Run another sep collection specifying the events needed for an Intel® Atom™ General Exploration  analysis.

 sep -start -em -ec "BR_INST_RETIRED.MISPRED.PS,BUS_LOCK_CLOCKS.ALL_AGENTS,CPU_CLK_UNHALTED.CORE,CPU_CLK_UNHALTED.REF,CYCLES_DIV_BUSY,DATA_TLB_MISSES.DTLB_MISS,EXT_SNOOP.ALL_AGENTS.HITM,FP_ASSIST.S,ICACHE.MISSES,INST_RETIRED.ANY,ITLB.MISSES,MACHINE_CLEARS.SMC,MEM_LOAD_RETIRED.L2_HIT.PS,MEM_LOAD_RETIRED.L2_MISS.PS,MISALIGN_MEM_REF.LD_SPLIT.AR,MISALIGN_MEM_REF.ST_SPLIT.AR,PAGE_WALKS.CYCLES,REISSUE.OVERLAP_STORE.AR,SIMD_ASSIST,UOPS.MS_CYCLES,UOPS_RETIRED.ANY" –out general_exp -app ./tachyon_find_hotspots

 

On your Linux* host: Import the VTune™ Amplifier XE results

  1. Copy the tb6 files created above
  2. To view these results in VTune Amplifier XE 2013
    1. source /opt/intel/vtune_amplifier_xe _2013/amplxe-vars.sh
    2. Start the VTune Amplifier XE
      1. amplxe-gui
    3. Create a Project
      1. File->New Project
        1. This will bring up the project properties dialog
          1. Click on the Search Directories tab and Specify “Search Directories” for “All Files"
            1. Add the directory that you built tachyon
            2. Click “OK”
    4. Import hotspot results
    5. File->Import Result
      1. Specify Import single result
      2. Browse to the hotspot_data.tb6 file
      3. Click “Import”

3.       You should see a result similar to to the following:

4. Repeat the process for the general_exploration.tb6 file you created and you should see the following:

5. You can also view the events you have collected at the source and assembly code level.

 

Para obter informações mais completas sobre otimizações do compilador, consulte nosso aviso de otimização.