Address Unique Needs in Cloud & HPC Profiling


Achieving the best performance for a high-performance computing (HPC) application requires a careful balance of a message passing interface (MPI) parallelism, threading, vectorization, memory access, and more. Intel® VTune™ Profiler provides specialized HPC analyses to let developers start with a quick snapshot, and then, if needed, get more details. Software architects tuning the performance of cloud applications will appreciate the ability to profile a running Java* process in a container.

Get a Quick Performance Snapshot

Analyze MPI and non-MPI applications. (Linux* only)

The application performance snapshot features:

  • Lightweight, low overhead profiling
  • Scalable profiling detects performance variation with a large number of ranks
  • Key metrics, such as MPI and OpenMP* imbalance, low floating-point utilization, communication patterns, and memory stalls

Determine whether this workload will benefit from tuning by viewing all the data in one place.

Deeper Analysis with Actionable Detail

See a summary of key HPC performance attributes: MPI efficiency, threading efficiency, memory access efficiency, and floating-point utilization. Then dive into the details and optimize the highest impact items first.

Use the HPC analysis to get a fast overview of critical metrics for modern hardware performance or get a more in-depth analysis for each one.

The summary now includes improved vectorization metrics, process and thread affinity, and a preview of Lustre* parallel file I/O metrics.

Easier Multirank Analysis of MPI and OpenMP*

For hybrid MPI and OpenMP applications, it is important to explore OpenMP inefficiency along with MPI communication between ranks. The lower the communication spin time, the more the rank is executing, and the more impact OpenMP tuning has.

Intel VTune Profiler can be installed on a cluster. For further tuning of MPI, use Intel® Trace Analyzer and Collector.

The list shows OpenMP regions where performance tuning can significantly reduce execution time, with the highest impact regions shown first.

Optimize Private Cloud-Based Applications

Profile enterprise applications written in Java* or in native languages like C, C++, and Fortran. Profile running Java services (like mail and daemons) without restarting the application. Popular containers that include Docker*, Mesos*, and LXC* are supported.

Intel VTune Profiler can easily attach to an application running in a container to collect profiling data.

Additional Capabilities

Single Thread

Optimize single-threaded performance.

Multithreaded

Effectively use all available cores.

System

See a system-level view of application performance.

Media & OpenCL™ Applications

Deliver high-performance image and video processing pipelines.

Memory & Storage Management

Diagnose memory, storage, and data plane bottlenecks.

Analyze & Filter Data

Mine data for answers.

Environment

Fits your environment and workflow.

Are you ready to try or purchase Intel VTune Profiler?

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804