Intel® Performance Tuning Utility 4.0 Update 5

We have retired PTU as our experimental performance analysis tool. Intel® VTune™ Amplifier XE now contains the best PTU capabilities as well as the latest experimental performance analysis features.

All New Intel® VTune™ Amplifier XE now includes many Intel® PTU features
Heard good things about Intel PTU? Try the all new VTune Amplifier XE first. It includes many of Intel PTU’s experimental features and a lot of other good stuff. Like a threading timeline, attach to a running process, analysis of 32-bit apps on 64-bit OSs and more.

Intel® VTune™ Amplifier XE

Intel® PTU is an experimental performance analysis tool to test new technology before it becomes a product. It requires a license to VTune Amplifier XE. Many of the features from Intel PTU are now in our fully supported product, VTune Amplifier XE, so give it a try first.
What If Home | Product Overview | Technical Requirements | FAQ | Primary Technology Contacts | Discussion Forum | Blog

Product Overview

The Intel® Performance Tuning Utility (Intel® PTU) is a cross-platform performance analysis tool set. Alongside with such traditional features as identifying the hottest modules and functions of the application, tracking call sequences, identifying performance-critical source code, Intel PTU has new, more powerful capabilities of data collection, analysis, and visualization. For experienced users, Intel PTU offers the processor hardware event counters for in-depth analysis of the memory system performance, architectural tuning, and others. It associates performance issues with the source code. If you do not have symbol sources for an analyzed application, Intel PTU represents data with basic block granularity and provides a graph of the function execution flow (control flow graph) to navigate the disassembly. The Intel Performance Tuning Utility is available for both Windows* and Linux* operating systems.

The Intel® Performance Tuning Utility offers:

  • Event Based Sampling - Uses the processor’s onboard performance monitoring hardware to get a detailed look into performance issues
  • Basic Block Analysis - Displays hotspots with basic block granularity and generates a control flow graph for advanced analysis of application, even without the source code
  • Events over IP graph - Generates a histogram of performance events distributed over application code
  • Loop Analysis - Identifies loops and recursion in your application to aid optimization
  • Result difference - Compares the results of multiple runs to measure changes in performance
  • Data Access Profiling - Identifies memory hotspots and relates them to code hotspots
  • Statistical Call Graph - Profiles with low overhead to detect where time is spent in your application
  • Heap Profiler - Identifies dynamic memory usage by application. Can help identify memory leaks
  • Instrumentation-based Call Graph, Call Count - Provides exact call graph and call count information for your application

This version of the Intel Performance Tuning Utility introduces the following new features and enhancements:

  • Intel® Microarchitecture Code Name Sandy Bridge support including AVX-instructions support in Data Profiling
  • Updated events for Intel® Atom™ processor
  • New predefined Profile configurations
  • Results difference functionality enabling flexible match of modules
  • Enhanced Project and Hotspot analysis configuration
    • Start paused, specifying delay
    • Enable and control trigger-based event multiplexing
  • Enhanced Ratio definition formulas: using ratios as operands, Min/Max operations
  • Ability to integrate with Intel® Performance Bottleneck Analyzer
  • Several bug fixes in collection and analysis

The capabilities of this utility for performance analysis are in many ways similar to that of the Intel® VTune™ Performance Analyzer, however this technology includes features that may be of more value to those who are more experienced with performance tuning. This utility explores some new approaches for data collection and user interface techniques. One of our goals for releasing this utility to the public, in addition to providing our customers powerful performance analysis tools, is to get feedback on what you do or don’t like. We appreciate your help which will make future versions of this utility, or other Intel software products, even better.

As the capabilities in this utility are experimental, we cannot guarantee any level of support for them. Some of the features and interface designs may find their way into released and supported products, some may not.

The current version of this utility is built as a plug-in to the Eclipse environment. This distribution package of Intel® PTU includes an Eclipse environment and is integrated into it.

Parallelization Made Easier with Intel® Performance Tuning Utility was published in The Intel Technology Journal. The paper explores how the Intel® Performance Tuning Utility significantly improves on the data collection and display features available and adds capabilities needed for enabling and analysis of parallel execution.

Technical Requirements

  1. You must have a license for the Intel VTune™ Performance Analyzer product or Intel® VTune™ Amplifier XE 2011 product on your system. If you do not, you can acquire the commercial product or try an evaluation copy.
  2. Please see the release notes for more details on technical requirements, including the list of supported processors and operating systems.


Figure 1

The Advanced Profile view (at the bottom) explains the highlighted issue details for the selected hot basic block in equake application.

Figure 2

The heaviest source line (#597) is disassembled on the right and grouped into basic blocks. Analyze the execution flow from the Flow Graph at the bottom.

Figure 3

The Overtime view shows changes in the utilization of CPU resources while the “openmp_triad” sample application was running. For processors with Intel® Turbo Boost technology enabled, the histogram on the left shows the CPU frequency deviations from the reference frequency during the application run.

Figure 4

The performance of the ‘calc_noise’ function is improved. The difference between two collected experiments is 39.51 msec.

Figure 5

The ‘outer_loop’ function in different call branches can be the first target for parallelization. The heavy loop is detected inside the function and its caller function.

Frequently Asked Questions

Q - How do I get started using the Intel® Performance Tuning Utility?

A - There are 2 things that we recommend you do before starting to use this tool. First of all, make sure that you have the Intel VTune™ Performance Analyzer product or Intel® VTune™ Amplifier XE 2011 (with an unexpired support period) on your system. If you do not or are not sure about it, you can acquire the commercial product or try an evaluation copy. Next, make sure that you have reviewed the installation and usage guide. This guide provides a visual screenshot by screenshot display of all user interface interactions needed to invoke and effectively use Intel® PTU. This guide, along with the User Guide, helps you ensure that the installation of Intel® PTU does not inadvertently impact your usage of the VTune Performance Analyzer. Intel® PTU and Intel® VTune™ Amplifier XE can be installed and working on the same system without a conflict.

Q - Where can I get support for the use of this utility?

A - We encourage you to visit our Intel® Performance Tuning Utility forum for support.

Q - What are the licensing terms that spell out how exactly I can use this utility?

A - The licensing terms are listed on the download page.

Q - Can you tell me a little more about the results difference feature of Intel(R) PTU?

A - The results difference feature is a powerful feature that allows you to see the performance difference made by either changing compiler switches using the same compiler or by changing the compiler used to generate the application. The fans of this capability have seen significant productivity gains by being able to quickly see the performance impact of compile time changes as soon as a new build is ready, which is all the more important during regression testing.

Q - Is this version backward compatible with v2.0 or v3.x? Can I view results in v4.0, collected from previous versions PTU v2.0 or v3.x?

A - Unfortunately, you may not. Moreover, you will probably need to run your analysis again with Intel® PTU v4.0 (sometimes results re-conversion may help).

Q - What is new with Intel® Performance Tuning Utility 4.0 Update 5?

A - See an appropriate section of the product release notes.

Q - Where can I read about the new Intel® Core™i7 Processor PMU features such as PEBS Improvements, Last Branch Record (LBR) Collection & Matrix Event: Offcore_Response_0

A - See Intel® Core™i7 Processor Features in Intel®Performance Tuning Utility 3.2 for detailed information.

Primary Technology Contacts

Dmitry Ryabtsev, Sr. Software Engineer

Konstantin Lupach, Sr. Software Engineer

Julia Fedorova, Sr. Software Engineer

We encourage you to visit the Intel® Performance Tuning Utility discussion forum. You can ask all your questions there. Please do not post them as comments to this page since we do not monitor them regularly.

For more complete information about compiler optimizations, see our Optimization Notice.