As I continue to explore different Ultrabook capabilities, in this blog I decided to look into a powerful threading and performance optimization tool for C/C++, .NET, and FORTRAN developers who need to understand an application's serial and parallel behavior to improve performance and scalability: Intel® VTune™ Amplifier XE 2011.
VTune, an Intel® Parallel Studio XE tool, provides information on code performance for users developing serial and multithreaded applications on Windows* and Linux* operating systems. VTune helps you analyze the algorithm choices and identify where and how your application can benefit from available hardware resources.
In this blog, I will walk through how to use the Hotspots analysis of the VTune™ to understand where an application is spending time (identify hotspots - the most time-consuming program units), and detect how they were called. The Hotspots analysis is useful to analyze the performance of both serial and parallel applications on an Ultrabook using Window 8 and Microsoft Visual Studio 2012 RC.
Acquire Intel VTune Amplifier XE
If you do not already have access to the VTune Amplifier XE, you can download an evaluation copy from the Evaluation Center.
Finding Hotspots and Optimize your application:
Step 1: Prepare for analysis (Do one of the following)
- In the Visual Studio* IDE: Choose a project, verify settings, and build application
- In the Standalone Intel VTune GUI: Build an application to analyze for hotspots and create a new VTune Amplifier XE project
Step 2: Find hotspots
- Choose and run the Hotspots analysis
- Interpret the result data
- View and analyze code of the performance-critical function
Step 3: Eliminate hotspots
- Modify the code to tune the algorithms or rebuild the code with Intel® Compiler
Step 4: Check your work
- Re-build the target, re-run the Hotspots analysis, and compare the result data before and after optimization
Open VTune Amplifier XE
Open Microsoft Visual Studio 2012 RC and build the following sample of Multithread C# code.
static void Main()
// Create a new thread
Thread myThread = new Thread(myNextThread);
// running myNextThread()
//Run the main thread concurently.
for (int i = 0; i < 200; i++)
static void myNextThread()
for (int i = 0; i < 200; i++)
Run Hotspots Analysis
To run an analysis:
- Right hang click on MultithreadProgram
- Click on Hostspots Analysis
- From the VTune toolbar, click the New Analysis button. The VTune result tab opens with the Analysis Type window active.
- On the left pane of the Analysis Type window, locate the analysis tree and select Algorithm Analysis Hotspots. The right pane is updated with the default options for the Hotspots analysis.
- Click the Start button on the right command bar.
This tutorial explains how to run an analysis from the VTune Amplifier XE graphical user interface (GUI). You can also use the VTune Amplifier XE command-line interface (amplxe-cl command) to run an analysis. For more details, check the Command-line Interface Support section of the VTune Amplifier XE Help.
Understand the Basic Hotspots Metrics
Start analysis with the Summary window. To interpret the data, hover over the question mark icons "?" to read the pop-up help and better understand what each performance metric means.
I compiled and ran the above code on an Ultrabook ASUS ZenBook UX 21 with a 2nd generation Intel® Core™ i5 processor), running Microsoft Windows 8 and Microsoft Visual Studio 2012 RC C#.
Note that CPU Time for the sample application is equal to3.084 seconds. It is the sum of CPU time for all application threads. Total Thread Count is 6, so the sample application is multi-threaded. The Top Hotspots section provides data on the most time-consuming functions (hotspot functions) sorted by CPU time spent on their execution.
For the sample application, the WaitForSingleObject function, which took 1.719 seconds to execute, shows up at the top of the list as the hottest function. The [Others] entry at the bottom shows the sum of CPU time for all functions not listed in the table.
Intel® VTune™ Amplifier XE Features
Accurate performance data
Without data you are just guessing about the location of the performance bottleneck and can easily waste a lot of time.
Collecting data always has a cost. The Intel® VTune™ Amplifier XE performance profiling tool keeps the overhead low, making data collection faster and the results more accurate
We've added a number of pre-defined performance profiling experiments to the full custom capabilities of earlier versions of Intel® VTune™ Performance Analyzer. This makes it easier to get great profiling information without needing to know microarchitectural details.
Learn: Learning Lab Portal | Evaluation Guide Portal
Product details: Product details (click on the "how to" or "learn" tab for training videos)
Documentation: Intel® VTune™ Amplifier XE Documentation | Intel® VTune™ Amplifier XE product brief
Release Notes: Intel® VTune™ Amplifier XE for Windows* | Intel® VTune™ Amplifier XE for Linux*
Getting Started Guides: Intel® VTune™ Amplifier XE for Windows | Intel® VTune™ Amplifier XE for Linux
Support and Online Communities: Forum | Knowledge Base | Blogs
Related Links: Intel® Software Network | go-parallel.com
Intel® Parallel Studio XE tutorials (HTML, PDF): /enus/articles/intel-software-product-tutorials/
Intel® Parallel Studio XE support page: /en-us/articles/intel-parallel-studio-xe/
For advanced performance optimization and greater value: the Intel VTune Amplifier XE performance profiling tool is also available as part of the Intel® Parallel Studio XE, Intel® C++ Studio XE, Intel® Fortran Studio XE, and Intel® Cluster Studio XE product suites.