Finding Hotspots

Use the Intel(R) VTune(TM) Amplifier 2014 for Systems for Linux* to identify and analyze hotspot functions in your serial or parallel embedded application by performing a series of steps in a workflow. This tutorial guides you through these workflow steps while using a sample ray-tracer application named tachyon that runs on your embedded device.

To optimize the performance of your embedded application, you must first understand its current performance qualities as it runs on the embedded device. You then modify the application based on that performance data, and, check the new performance metrics to compare the results. You can repeat this cycle until the results match your performance goals.

When you check the performance of your application you run advanced sampling-based performance analysis on the application as it runs. These analyses help you identify performance hotspots and bottlenecks, and if they are not where you expect them, you can rewrite your code accordingly and test again. Each time you make a change and test it, you compare the new results time over time to insure an increase in performance.

To obtain this important sampling-based performance data, you compile and run your application in a supported, embedded development environment. Then, you define and launch a profiling agent which is called a remote data collector that also runs on the embedded device. This remote data collector then records specified performance data collected from your running application.

Then this performance information is automatically transferred to a server system where you can view and analyze it, and plan your optimization strategy and its implementation based on your available time and resources. Your embedded application must be cross compiled and present on this server system as well, so that your results will accurately reflect the function names and the line numbers in your code. While there are several supported embedded OS versions, this tutorial focuses on the Yocto Project* 1.* environment.

To summarize, for this tutorial you will collect data on your embedded system with the VTune Amplifier GUI amplxe-gui and SSH communication, started from the host system.

Copying the kernel and drivers from your host to your target system is a one-time setup procedure, after which you can run multiple data collection sessions and view and compare the results.

Once you have collected performance data you make modifications to your code to improve its performance profile, and test again.


This tutorial focuses on obtaining the baseline results for Advanced Hotspots Analysis and the tachyonsample application. For more information on the iterative process of testing, modifying, improving, and retesting your code for comparative analysis, see Tutorial: Finding Hotspots: Compare with Previous Result at

To find hotspots in your application complete these activities:

Step 1: Prepare your host

  • Set up your Linux host
  • Set up a cross compilation environment

Step 2: Prepare your target device

  • Install a target package including remote collectors
  • Build a Yocto* Project kernel
  • Cross build and load the sampling driver (sep)
  • Configure ssh for a no-password connection

Step 3: Prepare your sample application

  • Cross compile tachyon for use
  • Copy tachyon to your Yocto* Project target

Step 4: Run Advanced Hotspot Analysis

  • Use the GUI to set up your remote configuration
  • Collect performance data on your application

Step 5: View your results

  • See analysis results

For more complete information about compiler optimizations, see our Optimization Notice.