Intel® VTune™ Amplifier XE 2013

Performance profiler for serial and parallel performance analysis.

  • Collect a rich set of data to tune CPU & GPU compute performance, multi-core scalability, bandwidth and more
  • Sort, filter and visualize results for quick insight into performance bottlenecks
  • Automate regression tests and collect data remotely using the powerful command line

From $899
Buy Now

Or Download a Free 30-Day Evaluation Version

Service Pack 1 Released - What’s New

Optimize Serial and Parallel Performance

Intel® VTune™ Amplifier XE 2013 is the premier performance profiler for C, C++, C#, Fortran, Assembly and Java*.


Functions using the most CPU time float to the top of the list. Click on a function to see the call stack. Double click to see the source with detailed profile data.

Easy
Performance optimization can be difficult, but the performance profiling tool you use shouldn’t be.

Versatile – Rich Set of Performance Profiles
Whether you are tuning for the first time or doing advanced performance optimization, VTune Amplifier XE 2013 provides the data needed to meet a wide variety of tuning needs.  Collect a rich set of performance data for hotspots, threading, locks & waits, DirectX*, bandwidth and more.

Productive – Sort, Filter and Visualize
Good data is not enough.  You need tools to mine the data and make it easy to understand.  Powerful analysis lets you sort, filter and visualize results on the timeline and on your source.

New for 2013! 
Caller/callee, OpenCL, OpenMP 4.0, hardware stack sampling, better bandwidth analysis, Java profiling, tune Intel® Xeon Phi™ products, user tasks, DirectX* frames, call counts and more.


Quotes

"We achieved a significant improvement (almost 2x) even on one core by optimizing the code based on the information provided by Intel® VTune™ Amplifier XE. Good scalability is a result of usage of combination of Intel® TBB and OpenMP parallelization techniques. We achieved over 8x the performance of the previous version on 8 cores and almost 11x the performance on 16 cores."
Alexey Andrianov, R&D Director Deputy, Mechanical Analysis Division, Mentor Graphics Corporation

"Intel® VTune™ Amplifier XE analyzes complex code and helps us identify bottlenecks rapidly. By using it and other Intel® Software Development Tools, we were able to improve PIPESIM performance up to 10 times compared with the previous software version."
Rodney Lessard, Senior Scientist, Schlumberger

“The new VTune™ Amplifier XE brings even more capability to an already indispensable tool. The sampling based call stack hotspots is excellent and alone is worthy of the upgrade. We have also been impressed by how the concurrency and Locks and Waits analysis can even provide useful data on complex applications such as Premiere Pro.”
Rich Gerber - Engineering Manager, MediaCore, Adobe Systems Inc.

“The new interface is a joy to use. Intel® VTune Amplifier XE gives us precise, down-to-the-metal performance data that’s invaluable for pinpointing hotspots and evaluating the effect of optimizations”
Daniel Schwarz, Performance Engineer, Nik Software

“Intel® VTune™ Amplifier XE’s timeline is very information intensive.  It organizes the data I need to tune threaded applications.”
Sergey Zaritchny, Software Development Manager, Open Cascade SAS

“Last week, Intel® VTune™ Amplifier XE helped us find almost 3X performance improvement.  This week it helped us improve the performance another 3X.”
Claire Cates, Principal Developer, SAS Institute Inc.

“One of Intel® VTune™ Amplifier XE’s best features is that it is easy to use.  I did not need to read the documentation.”
Richard Shepherd, Software Engineer, ESRI (UK) Limited

Quickly Locate Code Taking A Lot of CPU Time (or GPU time)

Hotspots analysis gives you a sorted list of the functions using a lot of CPU time. This is where tuning will give you the biggest benefit. Click [+] for the call stacks. Double click to see the source.

New! On newer processors, optionally collect GPU data for tuning OpenCL applications. Correlate GPU and CPU activities. (Windows* only.)

See the Results on Your Source

A double click from the function list takes you to the hottest spot in the function.

Tune Threading with Locks and Waits Analysis

Quickly find a common cause of slow performance in parallel programs: waiting too long on a lock while the cores are underutilized during the wait. Profiles like "basic hotspots" and "locks & waits" use a software collector that works on both Intel and compatible processors. New! OpenMP 4.0 support.

Mine the Data with Timeline Filtering

Select a time range in the timeline to filter out data (e.g., application startup) that masks the information you need. When you select and filter in the timeline, the grid that lists functions using a lot of CPU time updates to show the list filtered for the selected time.

Visualize Thread Behavior

See when threads are running and waiting, and when transitions occur. Balance workloads. Find lock contention.

New!Profile Remote Systems

Configure your host system to collect data from a remote Linux target.

Low Overhead / High Resolution Hardware Profiling

In addition to "basic hotspots" analysis that works on both Intel and compatible processors, VTune Amplifier XE 2013 has "advanced hotspots" analysis that uses the Performance Monitoring Unit (PMU) on Intel processors to collect data with very low overhead. Increased resolution (~1 ms vs. ~10 ms) can find hot spots in small functions that run quickly. New! Now with optional stack collection to identify the calling sequence.

Advanced Analysis Like Bandwidth

Preset profiles provide an easy "point and shoot" set-up. Choose Hotspot, Lightweight Hotspot, Concurrency, Locks and Waits or more advanced analyses. No memorizing complex event names. Advanced profiles like memory bandwidth analysis, memory access and branch mispredictions find tuning opportunities. New! Advanced profiles can optionally collect stacks to identify the calling sequence. (Profiles vary by microarchitecture.)

Opportunities Highlighted

The cell is highlighted in pink when there is a potential tuning opportunity. Hover to get suggestions.

New! OpenMP Scalability Analysis

Visualize time regions from the fork point to the join point for each parallel region. See what is serial, what is balanced and what is imbalanced. Here we see 13.671 seconds in an imbalanced region, 3.652 seconds in a fairly well balanced region.

New! Tune OpenCL

On newer processors, optionally collect GPU data for tuning OpenCL applications. Correlate GPU and CPU activities. (Windows* only.)

No special builds

Use a production build with symbols from your normal compiler.

Low overhead

Accurate results you can count on.

Command line

Automate regression analysis. Simple remote collection.

System Wide Analysis

Tune drivers, kernel modules and multi-process apps.

New!Tune Inlining with Call Counts

When a function is called frequently it may make sense to "inline" the code and eliminate the overhead of the function call. VTune Amplifier XE 2013 now provides statistical call count data to help you make better inlining decisions. It also displays profile results on the source code, even if the code is inlined, making it easier to interpret profile results.

New!Auto Detect Microsoft DirectX* Frames

Got a slow spot in your game play? You don't want to know where you are spending a lot of time, you want to know where you are spending a lot of time and the frame rate is slow. VTune Amplifier XE 2013 can now automatically detect Microsoft DirectX* frames and filter results to show you what is happening in slow frames. Not using DirectX*? Just define the critical region using the API and frame analysis becomes a powerful tool for analyzing latency.

New!Better Memory Bandwidth Analysis

VTune Amplifier XE 2013 performs a more accurate memory bandwidth analysis for both reads & writes to cache and memory. It also adds bandwidth analysis for additional processor types.

Intel® Threading Building Blocks, OpenMP 4.0, Intel® Cilk™ Plus

Built-in understanding of parallel programming models means profiling data is described using familiar terms from the source, not with cryptic internal runtime labels.

New!Low Overhead Java* Profiling

Analyze Java or mixed Java and native code.  Results are mapped to the original Java source.  Unlike some Java profilers that instrument the code, VTune Amplifier XE 2013 uses low overhead statistical sampling with either a hardware or software collector.  Hardware collection has extremely low overhead because it uses the on-chip performance monitoring hardware.

New!Analyze User Tasks

The task annotation API is used to annotate your source so VTune Amplifier XE 2013 can display which tasks are executing. For example if you label the stages of your pipeline, they will be marked in the timeline and hovering will reveal details. This makes profiling data much easier to understand.

New!Tune for Intel® Xeon Phi™ Products

Hardware profiling is supported for Intel® Xeon Phi™ products and can be launched from the graphic user interface. It can collect advanced hotspots and advanced event data and has time markers for correlation of data across multiple cards. Software collection (e.g., locks and waits analysis) is not supported on Intel® Xeon Phi™ products.

New!"Hot keys" Start and Stop Analysis

Add a short cut to quickly launch performance analysis whenever you see your app running slowly.  Program hot keys to start and stop the collection of performance data.

New!Tune MPI Applications

Analyze hybrid applications using MPI and OpenMP. Install on a cluster.

New!Support for New Processors

VTune Amplifier XE 2013 is constantly adding support for the latest processors. Updates are released shortly after new processors begin shipping.

Technical Specifications

For additional information and details on new features, please see the "What's new?" articles and release notes.

 

"The new VTune™ Amplifier XE brings even more capability to an already indispensable tool. The sampling based call stack hotspots is excellent and alone is worthy of the upgrade. We have also been impressed by how the concurrency and Locks and Waits analysis can even provide useful data on complex applications such as Premiere Pro."

Rich Gerber - Engineering Manager, MediaCore, Adobe Systems Inc.

"The new interface is a joy to use. Intel® VTune Amplifier XE gives us precise, down-to-the-metal performance data that's invaluable for pinpointing hotspots and evaluating the effect of optimizations"

Daniel Schwarz, Performance Engineer, Nik Software

"Intel® VTune™ Amplifier XE's timeline is very information intensive. It organizes the data I need to tune threaded applications."

Sergey Zaritchny, Software Development Manager, Open Cascade SAS

"Last week, Intel® VTune™ Amplifier XE helped us find almost 3X performance improvement. This week it helped us improve the performance another 3X."

Claire Cates, Principal Developer, SAS Institute Inc.

"One of Intel® VTune™ Amplifier XE's best features is that it is easy to use. I did not need to read the documentation."

Richard Shepherd, Software Engineer, ESRI (UK) Limited

 

What’s New in 2013 SP1?

We continuously release new features in regular updates available to all customers with a current service agreement (one year included with purchase). Just download, install and get all the latest stuff. Here is a partial list of new features released since our first release of Intel VTune Amplifier XE 2013: (For more details, see our What’s New? summary for each update.)

More Profiling Data

  • Intel® Xeon Phi™ – memory and vectorization profiling
  • GPU for compute – Tune OpenCL. Correlate GPU and CPU activities. (Newer processors, Windows* only.)

Better Data Mining – Find Answers Faster

  • Search added to all grids
  • Timeline sorting, band height, time scale configuration
  • Loop hierarchy, overhead and spin time metrics
  • OpenMP* 4.0 scalability analysis

Easier to Use

  • Attach to a running Java process
  • Contextual help for hardware events and performance metrics
  • Easier generation of command line options from the user i/f

New OS & Processor Support

  • Intel® Xeon Phi™ coprocessor, Haswell – Windows* & Linux*
  • Windows 8 desktop and Visual Studio* 2012 & 2013
  • Latest Linux distributions

What’s New in 2013?

Here are highlights of improvements made to the 2011 product:

More Profiling Data

  • Statistical Call Counts – Make better inlining decisions
  • Hardware Events with Stacks – Lower overhead, Higher resolution
  • Uncore Event Counting – More accurate bandwidth analysis
  • Intel® Xeon Phi™ Coprocessor – Hardware event profiling

Better Data Mining – Find Answers Faster

  • Low Overhead Java* Profiling – Results map to the Java source
  • Source View for Inlined Code – (For Intel® and GCC compilers)
  • Task Annotation API – Label and visualize tasks.

Easier to Use

  • User Defined Metrics – Create meaningful metrics from events
  • Programmable Hot Keys – Quick spontaneous profiles
  • More/Better Advanced Profiles – (e.g., Bandwidth)

Videos to help you get started.

Register for future Webinars


Previously recorded Webinars:

  • Analyzing OpenCL applications with Intel® VTune™ Amplifier XE
  • Secrets of Performance Profiling – An Introduction to Intel® VTune™ Amplifier XE
  • Advanced Profiling with Intel® VTune Amplifier XE
  • Part 1: Find the bottleneck


    Part 2: Tune for Haswell (Sandy Bridge and Ivy Bridge)

  • Accelerating financial services applications using Intel® Parallel Studio XE with the Intel® Xeon Phi™ coprocessor
  • Find 3 performance scaling barriers using Intel® VTune™ Amplifier XE
  • Performance analysis on Intel® Xeon® Phi™ Coprocessor
  •  

    Download slides

  • How Intel® Parallel Studio XE is used to improve the HMMER application

Featured Articles

No se encontró contenido

More Tech Articles

Using Intel® VTune™ Performance Analyzer and Intel® Integrated Performance Primitives for Real-time Media Optimization
By adminPosted 02/09/20124
This article provides a brief overview of digital media concepts and software tools (such as the Intel® VTune™ Amplifier XE) used for encoding video and audio in compressed formats for playback and transport.
Designing Artificial Intelligence for Games (Part 4)
By adminPosted 02/09/20121
The gaming industry has seen great strides in game complexity recently. Game developers are challenged to create increasingly compelling games. This series explores important Artificial Intelligence (AI) concepts and how to optimize them for multi-core.
The Serial On-Ramp to the Multicore Highway: Preparing to Parallelize Code
By binstockPosted 02/09/20127
This article discusses how coding and optimization on-the-fly are opposed and how performance experts approach performance improvement. It explains how they systematically prepare their code for optimization and how the optimization process is done.
Achieving Performance: An Approach to Optimizing a Game Engine
By William Damon (Intel)Posted 02/09/20120
by Will Damon Introduction Wanting to try out something new for GDC 2002, a few game developers got together and invented the Dogma 2001 Challenge*. Chris Hecker and Sean Barrett, two of the original organizers, were brainstorming ideas for games and started to think about how many sprites could ...

Páginas

Suscribirse a

Supplemental Documentation

No se encontró contenido

Páginas

Suscribirse a

You can reply to any of the forum topics below by clicking on the title. Please do not include private information such as your email address or product serial number in your posts. If you need to share private information with an Intel employee, they can start a private thread for you.

New topic    Search within this forum     Subscribe to this forum


hardware event-based sampling.on i7u4500 haswell
By Nick G.2
Just installed Intel VTune Amplifier XE 2013 on a new notebook with second generation core i7u4500 CPU under Windows 8 Pro 64bit, but none of the hardware event-basedsampling (EBS) is available. Selecting any of the analysers displays "Unsupported Architecture Type".
VTune Amplifier XE breaks "Find all references" in VS2008
By Min Xu3
Once running VTune Amplifier XE on a C# project, a folder is automatically added to the solution named "Amplifier XE Results". As long as this folder is in the solution (even if projects/results are removed), "Find all references" in VS2008 will return 0 results. Once I remove "Amplifer XE Results" from the solution, then "Find all reference" will work properly again.
Beginner question - how to analyze a DLL, and a few seconds execution
By ingvarai3
Hi, I have downloaded and installed v-tune for the very first time. I have followed the tutors to some extent, but have no clue on how to do this: 1) Analyze a DLL I currently program in Visual Studio 2010 2) Analyze just a few seconds of execution of it I will explain 2): When the main program (Maxon Cinema 4D) runs, it loads my DLL and uses this DLL as a plugin. When the animation starts (Cinema 4D does 3D animations), my DLL is CPU intensive the first frame of the animation. I am not interested in this part at all. What interests me, is the part when the animation is up and running, and before it stops again. This part is interesting, and only this, because it is here where performance is crucial. So - can someone help me in the right direction on question 1) and 2)? Thanks in advance! -Ingvar  
VTune hangs when launched as a specific user
By Mateusz W.5
Hi,   We're experiencing an issue with where VTune hangs if run as a specific user. Please note - this user is capable of running VTune i.e. by attaching it to a running process, but NOT if run as following: /usr/local/intel/vtune_amplifier_xe_2013/bin64/amplxe-cl -collect snb-general-exploration -- ls &   This results in a hang: [1] 12246 % ps aux | grep <username> username  12246  0.8  0.0 230216 33656 pts/1    SNl  19:11   0:00 /usr/local/intel/vtune_amplifier_xe_2013/bin64/amplxe-cl -collect snb-general-exploration -- ls username  12248  0.0  0.0 230216 12760 pts/1    SN   19:11   0:00 /usr/local/intel/vtune_amplifier_xe_2013/bin64/amplxe-cl -collect snb-general-exploration -- ls   % gdb -p 12248 (gdb) bt #0  pthread_cond_wait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:162 #1  0x00007ffff51e9267 in void boost::condition_variable_any::wait<boost::unique_lock<boost::mutex> >(boost::unique_lock<boost::mutex>&)...
VTune for 64-bit application
By Uday Krishna G.2
Hi, For 32-bit and 64-bit application can we use same VTune Analyzer? Or is there any different version of VTune for 64-bit application?
Level 1/Level 2 cache misses and branch predictions for Ivy Bridge?
By T C1
Hi, I am limited to doing a general exploration for Ivy Bridge using VTune. Once I had performed this how can I see: -Level 1 cache misses -Level 2 cache misses -Branch prediction metrics? -TLB (is this the "DTLB overhead" value in the "summary" tab?)
Zero CPU time when collecting stacks and context switches
By Øyvind J.4
I have problems with the Advanced Hotspots analysis using the "Hotspots, stacks and context switches" mode: The application runs fine, and completes in about 6 seconds, but VTunes reports the CPU time as zero -- both in the summary, and for all functions in the top-down and bottom-up views.
"This analysis type is only defined for processors...... Ivy Bridge"
By T C11
I have an i7 4820K (overclocked), Win 7 64 Professional, Visual Studio 2012 version 11.0.61030 and the Intel Parallel Studio XE 2013 package. If I open up Visual Studio with Admin Privileges, open a project and then go to begin analysing I cannot choose any of the Ivy-Bridge specific analysis. I am being told that it is only for CPUs which are Ivy Bridge (even though my CPU is Ivy Bridge). Could someone please help?

Páginas

Suscribirse a Foros
  • What is the difference between "Basic Hotspots" and "Advanced Hotspots"?
  • Basic Hotspots (formerly "Hotspots")Advanced Hotspots (formerly "Lightweight Hotspots")
    Uses the software collector Uses the hardware collector and the on chip Performance Monitoring Unit (PMU)
    No driver required Requires a driver
    Runs on Intel® and compatible processors Requires a genuine Intel® processor for collection
    ~10ms resolution ~1ms resolution (finds smaller functions)
    Collects call stacks to show calling sequences New! Optional call stack collection
    Works in virtual environments Works in a virtual environment only when supported by the VM vendor (e.g., vSphere* 5.1)
  • Can I install and use Intel VTune Amplifier XE on a system with a compatible processor not manufactured by Intel® Corporation?
  • Yes. Intel VTune Amplifier XE will operate on both Intel® processors and compatible processors when analyzing applications containing Intel® instructions. Profiling features that use the software collectors (e.g., "Basic Hotspots" and "Locks & Waits") work on both Intel processors and compatible processors. Profiling features that use the hardware collectors and the on-chip performance monitoring unit (e.g., "Advanced Hotspots" and "Advanced Analysis") require a genuine Intel processor for data collection, but after collection the results can analyzed on a compatible processor.

  • Do I need to recompile?
  • No, you do not need to recompile in order to profile with Intel® VTune Amplifier XE. However, it is recommended that you have debug and symbol information available for your optimized application in order to get the most complete and useful results. Thus, your release build process may need to be modified to add symbol information to the optimized build.

  • Do I need to use the Intel compiler to use Intel® VTune Amplifier XE?
  • No, you do not need the Intel compiler to analyze applications. However, if you are using OpenMP, it is recommended that you use the Intel runtime if possible to get the best results.

  • Can I run a performance analysis on a remote system?
  • Yes.

  • Do I need multiple licenses to do remote data collections?
  • No. Once you have the product, the CLI installer (command line installer) permits the installation of collection on other systems of the same OS. You can collect the data on the remote system, but you will need a license to view the data. Copy the results directory to a system with the full product installed for viewing. For more details see “Remote Tuning Workflow” in the documentation. For installation details see “Installing Collectors on Remote Systems” in the release notes.

  • Why can’t I see my source code?
  • There are several possible reasons why VTune Amplifier XE may be unable to see your source.

    In order for source code to be visible you need to compile your code so that debug information is available. For example, on Linux*, verify you are compiling with the “-g” flag.

    You also need to let VTune Amplifier XE know where your source files, binary files and symbol files are located. To do this, open or create a Project and click on the “Project Properties” button. In the Project properties dialog, click on the “Search Directories” tab. In the pull down menu, click on “All files” and then specify the directory where your files exist. If you have any subdirectories remember to check the “Search subdirectories” box.

  • Do I need to be root to run the hardware collector used with "Advanced Hotspots" and "Advanced Analysis"?
  • No. On Linux*, you need to be root to install the driver for the hardware collector, but once it is installed root access is not required. On Linux*, depending upon the install options selected, you may need to be a member of the driver access group (“vtune ” by default) to use the hardware collector. The hardware collector is used for advanced hotspots analysis and advanced analysis. For more information see “Installing the Sampling Driver” in the documentation.

  • What file and directory permissions are required to use VTune™ Amplifier XE?
  • Because the hardware-based ("advanced") sampling analysis types require communication with the Performance Monitoring Unit (PMU) of the central processor, the installer attempts to install a device driver. For Windows*, the driver is signed and the person installing must be part of the Administrators group. On Linux*, the person installing the software must be root or have sudo access to install the driver. However, a Linux user can install the software locally without the device driver and still use the user-mode sampling analysis types: Basic Hotspots, Concurrency, and Locks-and-Waits. If the user is able to install the software as ‘root’, any user that desires to collect hardware-based samples may (depending upon the options selected during install) need to be part of the user group defined during the install. By default, this is the ‘vtune’ group, but it can be changed or omitted by accessing the Advanced options of the installer (install.sh).

  • Why can’t I import results?
  • In order to import results into VTune Amplifier XE, you must first create a project to contain the imported results. In the VTune Amplifier XE, click on the File->New->Project menu. This will bring up a dialog asking you to select a project name. Enter a name and press “OK”. VTune Amplifier XE will display the “Project Properties” dialog. If you are only importing results into the project then you will not need to specify an application name. However, if you want to view source of the imported results, you need to specify the search directories where your source and binaries are located. In the Project properties dialog, click on the “Search Directories” tab. In the pull down menu, click on “All files” and then specify the directory where your files exist. If you have any subdirectories remember to check the “Search subdirectories” box.

  • I added a path to the Search Directories, but nothing changed?
  • The Search Directories are used during finalization and that normally occurs after data collection completes. In order for new “Search directory” paths to take affect VTune Amplifier XE must re-resolve your results with the new information provided. Click on the “Analysis Type” tab and then press the “Re-resolve” button on the far right, located directly below the “Start” and “Project Properties” buttons.

  • Why do the sample counts look wrong?
  • Sometimes, the sample counts may be displayed on source lines that are not normally associated with executable code, for example, the closing brace of a ‘for’ or ‘while’ loop. This may appear to be an error but is a result of the instructions generated by the compiler. Viewing the assembly code can reveal that the debug information for the assembly instructions to which the samples are attributed, are tagged as belonging to that source line, i.e., the closing brace.

    Other times, viewing of the assembly instructions may show that certain hardware events were collected on instructions that could not possibility generate that event, e.g., a memory event on a jump instruction or an arithmetic event on a memory instruction. This is known as “event skid” and is a result of the processor being unable to stop the execution of some micro-ops before sampling the instruction pointer. Thus, the IP is pointing at a subsequent instruction by the time the sample is taken. Typically, you can determine which instruction was responsible for the event by examining the instruction flow.

  • How can I use Intel® VTune Amplifier XE to see how much time is spent doing Disk I/O?
  • If your application is doing blocking I/O, the function call attributed to the file accesses should appear in Basic Hotspots Analysis. Additionally, if you have multiple threads waiting to access a single file, the synchronization object protecting the file , e.g. a Critical Section, should show up in the Locks and Waits analysis.

Intel® VTune™ Amplifier XE 2013

Getting Started?

Click the Learn tab for guides and links that will quickly get you started.

Get Help or Advice

Search Support Articles
Forums - The best place for timely answers from our technical experts and your peers. Use it even for bug reports.
Support - For secure, web-based, engineer-to-engineer support, visit our Intel® Premier Support web site. Intel Premier Support registration is required.
Download, Registration and Licensing Help - Specific help for download, registration, and licensing questions.

Resources

Release Notes - View Release Notes online!
Documentation:
Windows* | Linux*
Documentation for other software products

Featured Support Topics

No se encontró contenido