Intel® VTune™ Amplifier XE 2013
Intel® VTune™ Amplifier XE is an easy to use performance and thread profiler for C, C++, C#, Fortran, Java and MPI developers. No special recompiles are needed, just start profiling. Hotspots are highlighted on the source. A powerful timeline makes it easy to tune your application and scale performance on multicore processors.
New for Update 9!
- Support for Hotspots, General Exploration and Bandwidth analysis types on the Intel® Xeon Phi™ coprocessor (except for the user API analysis) (on Windows* only)
- Advanced Hotspots analysis (formerly, Lightweight Hotspots) introducing several collection levels
- GPU analysis for Intel Processor Graphics based on hardware metrics such as Execution Units (EU) Array Active/EU Array Stalled/EU Array Idle, GPU Memory Bandwidth, GPU L3 Cache Misses, and others (on Windows* only)
- GPU analysis based on DirectX* pipeline events and used to correlate CPU/GPU usage and identify whether an application is CPU or GPU bound (on Windows* only)
- Top-Down performance analysis methodology in General Exploration analysis type for the 4th generation Intel® Core™ processors based on the Intel microarchitecture code name Haswell
- Overhead and Spin time classification for GCC* and Microsoft OpenMP* runtimes
- Source and assembly data available in the command line reports
- Total metric for flat groupings in the Source/Assembly panes
- Bug fixes
Installer for Intel® Vtune™ Amplifier XE 2013 Update 9 for Linux*
Installer for Intel® Vtune™ Amplifier XE 2013 Update 9 for Windows*
* Other names and brands may be claimed as the property of others.
Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.
Next: Update 8 What's new.
With VTune Amplifier you can now tune on the Intel® Xeon Phi™ coprocessor from Windows* host. Choose one of pre-defined analysis: Hotspots, General Exploration and Bandwidth, or create a custom one. Follow Finding Hotspots on the Intel® Xeon Phi™ Coprocessor tutorial and refer to the document “Optimization – Part 2: Hardware Events” for optimizing applications on the Intel Xeon Phi coprocessor using VTune™ Amplifier XE 2013 for Windows. To get more information about Windows* early enabling program for Intel® Xeon Phi™ Coprocessor please visit http://software.intel.com/en-us/mic-developer and http://software.intel.com/en-us/articles/windows-early-enabling-for-intelr-xeon-phitm-coprocessor.
NOTE: User API analysis is not yet supported by VTune Amplifier from Windows* host and will be enabled in future updates
The former “Hotspots” and “Lightweight Hotspots” analysis types were renamed in GUI to “Basic Hotspots” and “Advanced Hotspots” respectively introducing several collection levels. “Basic hotspots” provides general performance profile on user level. “Advanced Hotspots” performs Hardware Event Based Sampling analysis by using PMU counters with ability to specify collection with different levels of details and overhead:
- “Hotspots” - no stacks, context switches and call counts - low overhead
- “Hotspots, stacks and context switches” – medium overhead
- “Hotspots, call counts, stacks and context switches” – the highest level of details for the cost of more overhead
For more information on the interface changes please refer to the Intel® IDZ KB article
NOTE: Command line interface still supports former analysis format in deprecated mode to allow gradual migration to a new analysis
GPU analysis for Intel Processor Graphics based on hardware metrics such as Execution Units (EU) Array Active/EU Array Stalled/EU Array Idle, GPU Memory Bandwidth, GPU L3 Cache Misses, and others (Windows* only)
For applications using a Graphics Processing Unit (GPU) for rendering, video processing, and computations VTune Amplifier can monitor, analyze, and correlate activities on both the CPU and GPU (Windows* only). To enable the GPU analysis, you have to configure your predefined or custom configuration to Analyze Processor Graphics and DirectX* pipeline events. GPU analysis for Intel Processor Graphics is based on hardware metrics such as Execution Units (EU) Array Active/EU Array Stalled/EU Array Idle, GPU Memory Bandwidth, GPU L3 Cache Misses, and others, it helps to estimate how effectively the Intel Integrated Graphics is used. Analysis of DirectX* pipeline events is used to correlate CPU/GPU usage and helps to identify whether an application is CPU or GPU bound. For more information please refer to the “GPU Analysis” and “GPU Metrics” topics in the product help.
Explore Summary pane for GPU Usage and DirectX frame rate histogram:
Switch to “Graphics” tab to see distribution of the GPU metrics over time.
The Update 9 introduces Top-Down performance analysis methodology for the 4th generation Intel® Core™ processors based on the Intel microarchitecture code name Haswell integrated into the General Exploration analysis type. Hierarchical data display corresponds to how available execution slots in each core’s pipeline are utilized. Expand a column to see a breakdown of issues pertaining to its category of pipeline utilization: Retiring, Bad Speculation, Back-end Bound, or Front-end Bound Slots. For more details refer to the Haswell tuning guide at
VTune Amplifier is now capable to classify Overhead and Spin time for GCC* and Microsoft* OpenMP* runtimes and show the metrics in the
Overhead and Spin time for GCC* OpenMP*:
Overhead and Spin time for Microsoft* OpenMP*:
Source and assembly data available in the all command line reports. Use the “-source-object” option to switch a report to source or assembly view mode, including associated performance data. Specify “-group-by address” to see disassembly view. For more information please refer to the “Source-object” topic in the product help.
Example 1: $ amplxe-cl -report hotspots -source-object function=foo
Example 2: $ amplxe-cl -report hotspots -source-object function=foo -group-by basic-block, address
Analyze collected data in Source/Assembly pane per code line using the Self and Total types of performance metrics. For example, for the Basic Hotspots analysis, the CPU Time: Self column shows the amount of processor time (in seconds) taken to execute a code line while the CPU Time: Total column shows the processor time spent on the code line execution and calls from this line, if any.
*Other names and brands may be claimed as the property of others.