Intel® VTune™ Profiler is a performance profiling tool that delivers software and hardware performance analysis through its graphical and command line interface. There are three general types of data it collects:
Amazon Web Services* (AWS*) provides a large variety of instance types and sizes for users in its Elastic Compute Cloud* (EC2*) service. Some VTune Profiler collection types will be unavailable on certain instances due to the hypervisor not providing the necessary hardware counters.
|Instance||VTune Profiler Collections Supported||Application Performance Snapshot Supported?|
The instances tested include C5, R5, and M5 instances of various sizes. These all use Intel® Xeon® Scalable Processors (codename Skylake and Cascade Lake). The C5 instances are compute optimized meaning they deliver efficient and cost effective performance. The R5 instances are memory optimized so they are able to handle large amounts of memory and deliver effective performance. The M5 instances are general purpose meaning they deliver performance optimizing memory, computing power and network resources.
The PMU is on-chip hardware that monitors micro architectural events such as cache misses, cache hits and elapsed cycles. It also analyzes how the operating system or application performs on the processor. The PMU consists of two main types of events, hardware and software. The hardware event includes instructions, CPU cycles and cache references, and the software event includes context switches and page faults.
VTune Profiler has two ways of collecting on these events in Linux*:
VTune Profiler analysis types such as the Additional Insights on Hotspot Analysis, Microarchitecture Exploration and HPC Performance Characterization require access to PMU events in order to provide hardware data such as instructions retired and number of cycles. The PMU events accessible on AWS* instances depends largely on the instance size. The instances tested run on Intel Xeon Scalable Processors with two sockets. Only instance sizes that use one or both complete sockets allow for PMU access, presumably because partial use of a socket results in shared CPU resources. Of the larger instances tested, the M5.16xlarge and R5.16xlarge instances do not support PMU events because they consume one complete socket and a portion of the second. Therefore they do not allow for the hardware analyses to take place.
Application Performance Snapshot (APS) is a utility packaged with VTune Profiler for Linux*. It provides the ability to quickly visualize MPI and OpenMP imbalances, efficiency of memory access, floating point unit (FPU), I/O and memory data in your application. After analyzing this data, it displays ways to perform additional analysis with VTune Profiler.
APS has the same limitations as VTune Amplifier hardware analysis types. It can only run when PMU events are accessible.
The VTune Profiler Platform Profiler utility is also packaged with VTune Profiler. It profiles at the system level to help identify hardware configuration issues such as storage layout, memory and disk I/O, CPU frequency, cycles per instruction (CPI), power consumption and many more.
Platform Profiler is limited to use on metal instances only.
Some instance types have a metal offering that is the same size as the largest non-metal instance. For example, c5.24xlarge has the same number of vCPUs as c5.metal, and appears to utilize the same hardware. The main difference is that the 24xlarge instance still uses a hypervisor which prevents full access to the PMU, including uncore events used in memory access analysis. The result is that VTune Profiler will still be limited on the largest non-metal instance, and fully functional on the metal equivalent.
Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804