Diagnose Memory, Storage, & Data Plane Bottlenecks

Not all workloads are compute-bound. Intel® VTune™ Amplifier has specialized analyses for optimizing the use of memory and I/O bandwidth.

screenshot of a memory access grap

Figure 1

Optimize Bandwidth-Limited Software

Use the timeline to see the spikes in bandwidth used for DRAM and Intel® QuickPath Interconnect. To see which functions are consuming bandwidth at a specific time, select a spike in the timeline and filter on the selection. This lets you isolate the individual contributors to bandwidth consumption and tune effectively.

Functions that are significantly memory bound are highlighted in pink (see Fig. 1).

Identify Which Memory Objects Are Bottlenecks

A typical hotspot analysis shows code that is taking the most time. The Memory Access analysis offers a different perspective—it shows which memory objects consume bandwidth, independent of where they are accessed. This can yield new insight on how to improve performance for Linux* targets only.

Memory Access analysis lets you attribute performance events to memory objects, so you can see the data structures that are contributing to memory issues (see Fig. 2).

interface for the Memory Access analysis

Figure 2

interface for message passing interface (M P I) and OpenMP multirank analysis

Figure 3

Tune Non-Uniform Memory Access (NUMA)

Some memory accesses can be slower than others. For example, on a two-socket system, latency is higher when a core in socket 0 accesses memory that is attached to socket 1. Memory Analysis in Intel VTune Amplifier lets you identify frequently accessed data that is stored remotely and reconsider how you allocate memory. Memory access analysis shows both local memory access (which is fast) and remote memory access (which is slow). Changing your memory allocation to improve local access may improve performance (see Fig. 3).

Uncover I/O Bottlenecks

Determine whether your application is I/O-bound or CPU-bound by exploring imbalance between I/O operations (synchronous and asynchronous) and compute. See when the CPU is waiting for I/O, and see storage accesses mapped to the source code.

Sliders on the histogram control the display of data in the grid and on the timeline, making data analysis easier (see Fig. 4).

histogram for I O

Figure 4

Determine Which Systems Benefit from Faster Storage

Storage Performance Snapshot shows system storage bottlenecks for servers and workstations with directly attached storage. Easy to install, this tool helps you determine which workloads need further analysis and where faster storage improves performance. This snapshot comes with Intel VTune Amplifier and is also available separately to facilitate a quick system check.

Get a quick view of:

  • I/O boundedness
  • Storage and network saturation
  • CPU utilization
  • Memory capacity saturation

Get system data while running workloads to see how migration to Serial ATA and PCIe* SSDs can offer better solutions, user experiences, and performance density.

Storage Performance Snapshot

Collect data on Windows* or Linux systems and view the results in a web browser (see Fig. 6).

a dashboard showing a snapshot of storage performance

Figure 6

Additional Capabilities