Profiling Hardware Without Intel Sampling Drivers
- Intel sampling drivers cannot be installed (for example, if installed without root privileges).
- Collection with stacks is selected with a non-zero stack size and the requirements for driverless collection are satisfied.
- The option to use driverless collection is enabled and the requirements for driverless collection are satisfied.
- To enable the Perf driverless collection to match all the hardware profiling functionality provided with Intel drivers, you will need administrative privileges to configure system options as described below.
- To check which collector type - Perf or Intel sampling driver (SEP) - was used for you analysis, see theCollection and Platform Infosection of theSummarywindow.
- INGREDIENTS:Intel VTune Profiler (or its previous version - Intel VTune Amplifier 2019) can use the driverless mode if the following requirements are satisfied:
- Core and uncore events. All hardware event-based collections inVTuneuse core PMU events. Some of them such as Memory Access and IO analysis types require access to uncore events that enable collecting metrics like DRAM bandwidth, QPI/UPI bandwidth, PCI bandwidth, and others.Profiler
- Perf for Linux kernel 2.6.32 and higher.PMU events are exposed by Linux kernel through/sys/bus/event_source/devices/cpuand/sys/bus/event_source/devices/uncore_*directories. Empty directory content may indicate that the system configuration does not support PMU event collection. In this case, either update the OS or install the Intel sampling driver.
- /proc/sys/kernel/perf_event_paranoidvalue is equal to or less than 1.
- RECIPES FOR LIMITATIONS:
- Configure aVTuneproject and from theProfilerWHATpane select either theProfile Systemtarget or theLaunch Applicationtarget with theAnalyze system-wideoption enabled.
- Check the/proc/sys/kernel/perf_event_paranoidfile value with the following command:cat /proc/sys/kernel/perf_event_paranoidIf the value is less than 1, theVTunecan proceed with the system-wide collection.Profiler
- If theperf_event_paranoidvalue is equal to 1 (which limits the collection to user processes only) or more than 1 (which prevents theVTunefrom using the Perf driverless mode), set theProfilerperf_event_paranoidvalue to 0 for the system-wide collection:echo 0 > /proc/sys/kernel/perf_event_paranoid
- Memory Access analysis requires access to uncore events and will not run without ability to collect them. Other analysis types, like HPC Performance Characterization, will run but miss metrics based on uncore events such as DRAM Bandwidth, OPA Interconnect Bandwidth, and Packet Rate.
- Uncore collection in the driverless mode is not supported on Intel Atom® processors.
- Check the limit of opened files:ulimit -n
- If required, increase the limit in the/etc/security/limits.conffile. To do this, you must have administrator privilege. Increase the limit by adding or changing these lines (particular numbers are chosen as examples):
- soft nofile 65535
- hard nofile 65535
- If you increased the limit in step 2, log out of the shell or close it and reopen a secure shell connection. Log back in.With administrator privilege, you can set the limit for a specific user. The change should be visible when the user logs in again.
- Default 1024 byte stack size may not be enough for a full stack unwinding if a function intensively allocates data on the stack. This may lead to[Skipped stack frame(s)]displayed in the collected data.
- Linux kernel versions older than 3.7 support only frame-pointer (FP) based stack unwinding. This means that theVTunecan provide no stacks for binaries built without frame-pointer (Profiler-fomit-frame-pointercompiler option), as well as no Glibc stacks since Glibc is built without frame-pointers.
vtune -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -knob stack-size=2048 <application>
vtune -collect-with runsa -knob enable-driverless-collection=false -knob event-config=<event-list> <application>
echo 0 > /proc/sys/kernel/kptr_restrict
echo 0 > /proc/sys/kernel/nmi_watchdog
echo 1 > /proc/sys/kernel/nmi_watchdog
- Set a more aggressive affinity optimization as follows:vtune -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -run-pass-thru=--perf-affinity=cpu <application>
- To reduce the trace size for stack sampling collections, theVTuneuses a Linux Perf trace compression, which may introduce an additional overhead. To avoid this, disable the trace compression with theProfiler-run-pass-thruoption:vtune -collect hotspots -knob sampling-mode=hw -knob enable-stack-collection=true -run-pass-thru=--perf-compression=0 -run-pass-thru=-- perf-aio=0 <application>This can reduce collector overhead in rare cases, but the trace size increases dramatically.
- Set the limit of CPU time consumption by Linux Perf collector. For example, for a 10% limit, use the following command (with administrative privileges):cat 10 > /proc/sys/kernel/perf_cpu_time_max_percentThis can drop the sampling frequency and statistical accuracy to reach the limit.
vtune -collect-with runsa -knob enable-driverless-collection=true -knob event-config=<event-list> <application>
- For Linux kernel versions older than 5.8, useCAP_SYS_ADMIN. Runvtune-set-perf-caps.shscript to set up this configuration.
- For Linux kernel versions newer than 5.8, useCAP_PERFMON.
- Create avtunegroup for privilegedamplxe-perfusers.
- Assign thevtunegroup to the Perf tool executable.
- Restrict access to the executable to only those users who are in thevtunegroup.# cp amplxe-perf amplxe-perf-priv # groupadd vtune # chgrp vtune amplxe-perf-priv # chmod o-rwx amplxe-perf-priv
- Assign the required capabilities to the Perf tool executable.
If the installed libcap does not support# setcap -v "cap_perfmon,cap_sys_ptrace,cap_syslog=ep" amplxe-perf-priv # getcap amplxe-perf-priv amplxe-perf-priv = cap_sys_ptrace,cap_syslog,cap_perfmon+epcap_perfmon, use38instead:# setcap "38,cap_sys_ptrace,cap_syslog=ep" amplxe-perf-priv # getcap amplxe-perf-priv amplxe-perf-priv = cap_sys_ptrace,cap_syslog,38+ep