Optimize Multithreaded Performance


In modern multicore systems, threaded performance is critical for exploiting the full potential of the processor. Intel® VTune™ Profiler helps you tune your software to make effective use of all cores.

Find Common Causes of Slow Threaded Code

The Locks and Waits analysis helps you focus your tuning efforts and envision potential improvements. Use it to identify synchronization objects (locks) that prevent effective processor utilization and to estimate the impact and wait time each lock has on application performance.

See a prioritized list of synchronization objects that negatively impact performance.

Tune Parallel Performance

Intel VTune Profiler has the built-in ability to discern parallel programming models (including OpenMP* 4.0 and Intel® Threading Building Blocks) making it easy to visualize and understand multithreading concepts such as a task beginning and ending, synchronizing, and waiting. Get the data you need to tune performance and see which parallel regions are inefficient and why (for example, imbalance, lock contention, and communication).

Detailed data for each OpenMP region highlights tuning opportunities.

Visually Spot Inefficient Threading

Use the timeline to spot patterns of inefficient threading (like coarse-grained locks). The image shows multiple threads, but only one thread (dark green) runs at a time. No work is done in parallel due to data sharing issues. The timeline lets you visually spot threading inefficiencies. In this example, there are four threads, but only one is running at any given time, so thread concurrency is very low.

See Lock Contention

Another common threading performance issue is when multiple threads contend for the same lock. This becomes obvious when the timeline is dominated by yellow transition lines. A high density of transitions may indicate lock contention and poor parallel performance.

More Effective OpenMP* Tuning

The summary report quickly delivers the top four answers you need to effectively improve OpenMP performance. For additional details on each region, under OpenMP Region, select the links.

Getting the right data makes tuning OpenMP much more effective.

Additional Capabilities

Single Thread

Optimize single-threaded performance.

System

See a system-level view of application performance.

Media & OpenCL™ Applications

Deliver high-performance image and video processing pipelines.

HPC & Cloud

Access specialized, in-depth analyses for HPC and cloud computing.

Memory & Storage Management

Diagnose memory, storage, and data plane bottlenecks.

Analyze & Filter Data

Mine data for answers.

Environment

Fits your environment and workflow.

Are you ready to try or purchase Intel VTune Profiler?

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804