What's new? Update 9 - Intel® VTune™ Amplifier XE 2013

Intel® VTune™ Amplifier XE 2013

Intel® VTune™ Amplifier XE is an easy to use performance and thread profiler for C, C++, C#, Fortran, Java and MPI developers. No special recompiles are needed, just start profiling.  Hotspots are highlighted on the source.  A powerful timeline makes it easy to tune your application and scale performance on multicore processors.

New for Update 9!  

Resources

Contents

 

File: vtune_amplifier_xe_2013_update9.tar.gz

Installer for Intel® Vtune™ Amplifier XE 2013 Update 9 for Linux*

File: VTune_Amplifier_XE_2013_update9_setup.exe

Installer for Intel® Vtune™ Amplifier XE 2013 Update 9 for Windows*

* Other names and brands may be claimed as the property of others.

Microsoft, Windows, Visual Studio, Visual C++, and the Windows logo are trademarks, or registered trademarks of Microsoft Corporation in the United States and/or other countries.

Next: Update 8 What's new.

Support for Hotspots, General Exploration and Bandwidth analysis types on the Intel® Xeon Phi™ coprocessor (except for the user API analysis) from Windows* host

With VTune Amplifier you can now tune on the Intel® Xeon Phi™ coprocessor from Windows* host. Choose one of pre-defined analysis: Hotspots, General Exploration and Bandwidth, or create a custom one. Follow Finding Hotspots on the Intel® Xeon Phi™ Coprocessor tutorial and refer to the document “Optimization – Part 2: Hardware Events” for optimizing applications on the Intel Xeon Phi coprocessor using VTune™ Amplifier XE 2013 for Windows. To get more information about Windows* early enabling program for Intel® Xeon Phi™ Coprocessor please visit http://software.intel.com/en-us/mic-developer and http://software.intel.com/en-us/articles/windows-early-enabling-for-intelr-xeon-phitm-coprocessor.

NOTE:  User API analysis is not yet supported by VTune Amplifier from Windows* host and will be enabled in future updates

Advanced Hotspots analysis (formerly, Lightweight Hotspots) introducing several collection levels 

The former “Hotspots” and “Lightweight Hotspots” analysis types were renamed in GUI to “Basic Hotspots” and “Advanced Hotspots” respectively introducing several collection levels. “Basic hotspots” provides general performance profile on user level. “Advanced Hotspots” performs Hardware Event Based Sampling analysis by using PMU counters with ability to specify collection with different levels of details and overhead:

-        “Hotspots” - no stacks, context switches and call counts - low overhead

-        “Hotspots, stacks and context switches” – medium overhead

-         “Hotspots, call counts, stacks and context switches” – the highest level of details for the cost of more overhead

For more information on the interface changes please refer to the Intel® IDZ KB article

 

NOTE: Command line interface still supports former analysis format in deprecated mode to allow gradual migration to a new analysis

  

GPU analysis for Intel Processor Graphics based on hardware metrics such as Execution Units (EU) Array Active/EU Array Stalled/EU Array Idle, GPU Memory Bandwidth, GPU L3 Cache Misses, and others (Windows* only) 

 For applications using a Graphics Processing Unit (GPU) for rendering, video processing, and computations VTune Amplifier can monitor, analyze, and correlate activities on both the CPU and GPU (Windows* only). To enable the GPU analysis, you have to configure your predefined or custom configuration to Analyze Processor Graphics and DirectX* pipeline events. GPU analysis for Intel Processor Graphics is based on hardware metrics such as Execution Units (EU) Array Active/EU Array Stalled/EU Array Idle, GPU Memory Bandwidth, GPU L3 Cache Misses, and others, it helps to estimate how effectively the Intel Integrated Graphics is used. Analysis of DirectX* pipeline events is used to correlate CPU/GPU usage and helps to identify whether an application is CPU or GPU bound. For more information please refer to the GPU Analysisand GPU Metricstopics in the product help.  

GPU analysis based on DirectX* pipeline events and used to correlate CPU/GPU usage and identify whether an application is CPU or GPU bound (Windows* only)

Explore Summary pane for GPU Usage and DirectX frame rate histogram:

 

Switch to “Graphics” tab to see distribution of the GPU metrics over time.

Top-Down performance analysis methodology in General Exploration analysis type for the 4th generation Intel® Core™ processors based on the Intel microarchitecture code name Haswell

The Update 9 introduces Top-Down performance analysis methodology for the 4th generation Intel® Core™ processors based on the Intel microarchitecture code name Haswell integrated into the General Exploration analysis type. Hierarchical data display corresponds to how available execution slots in each core’s pipeline are utilized. Expand a column to see a breakdown of issues pertaining to its category of pipeline utilization: Retiring, Bad Speculation, Back-end Bound, or Front-end Bound Slots. For more details refer to the Haswell tuning guide at

Overhead and Spin time classification for GCC* and Microsoft* OpenMP* runtimes

VTune Amplifier is now capable to classify Overhead and Spin time for GCC* and Microsoft* OpenMP* runtimes and show the metrics in the

grid and Timeline pane allowing to identify inefficiencies in using the threading runtimes when a significant portion of time may be spent inside the parallel runtime wasting CPU time at high concurrency levels (overhead), or when a significant portion of CPU time is spent on spin (active) waits. For more information please refer to “Overhead and Spin time topic in the product help.

Overhead and Spin time for GCC* OpenMP*: 

Overhead and Spin time for Microsoft* OpenMP*:

 

Source and assembly data available in the command line reports

Source and assembly data available in the all command line reports. Use the “-source-object” option to switch a report to source or assembly view mode, including associated performance data. Specify “-group-by address” to see disassembly view. For more information please refer to the “Source-object” topic in the product help.

 Example 1: $ amplxe-cl -report hotspots -source-object function=foo

Example 2: $ amplxe-cl -report hotspots -source-object function=foo -group-by basic-block, address

Total metric in the Source/Assembly panes

Analyze collected data in Source/Assembly pane per code line using the Self and Total types of performance metrics. For example, for the Basic Hotspots analysis, the CPU Time: Self column shows the amount of processor time (in seconds) taken to execute a code line while the CPU Time: Total column shows the processor time spent on the code line execution and calls from this line, if any.

 *Other names and brands may be claimed as the property of others.

Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.