To enable hardware event-based sampling analysis on your platform, the Intel® VTune™ Amplifier can use either sampling drivers that require root privileges for installation on the Linux* and Android* systems or the Perf* utility, which is part of the default VTune Amplifier installation package.
VTune Amplifier runs the hardware event-based sampling analysis in the driverless mode with the Perf utility if:
- The sampling drivers cannot be installed (for example, if installed without root privileges)
- Collection with stacks is selected with a non-zero stack size and the prerequisites for driverless collection are satisfied
- The option to use driverless collection is selected in the VTune Amplifier user interface and the prerequisites for driverless collection are statisfied
VTune Amplifier is installed to your default account. For non-root users, it provides a notification during the installation claiming that the sampling driver cannot be installed, so some product features could be limited or unavailable. To have the sampling driver installed, you need to re-start the install process under the root account or contact your administrator.
On Linux, the driverless collection is a default mode for analyses based on hardware event-based sampling with stacks (for example, Hotspots or Threading).
Prerequisites for Driverless Collection
VTune Amplifier can use the driverless Perf-based collection if the following requirements are satisfied:
Your system is based on kernel 2.6.32 or higher, which exports CPU PMU programming details over /sys/bus/event_source/devices/cpu/format file system.
Perf-based collection is enabled in the kernel with a /proc/sys/kernel/perf_event_paranoid value equal to or less than 1.
For uncore event analysis, uncore_* devices are available in the /sys/bus/event_source/devices folder.
Context switch data cannot be collected using Perf-based driverless collection if the kernel version is less than 4.3.
The types of context switches (preemption or synchronization) may not be identified on kernels older than 4.17.
Hardware event-based sampling analysis is configured to collect stacks.
Driverless Collection Modes
VTune Amplifier supports the following Perf-based collection types:
Driverless Perf per-process sampling collects samples for a single process and/or its children and can be done simultaneously from multiple monitoring processes. Since it requires performance counters virtualization per process, it can bring more overhead in comparison with system-wide collection. Typically, a system has this type of collection enabled by default.
Driverless Perf system-wide sampling is performed by one monitoring process for the whole system. It usually has less overhead since it does not require to virtualize counters per process. This collection type can collect uncore counters and requires kernel configuration.
Driverless Perf per-process counting provides event counting statistics over an interval for a single process or its children. Event counting can be done simultaneously from multiple monitoring processes. Since it requires performance counters virtualization per process, it can bring more overhead in comparison with system-wide collection. Typically, a system has this type of collection enabled by default.
Driverless Perf system-wide counting provides event counting statistics performed by one monitoring process for the whole system over an interval. It usually has less overhead since it does not require to virtualize counters per process. This collection type can collect uncore counters and requires kernel configuration.
To configure system-wide driverless collection:
Set the /proc/sys/kernel/perf_event_paranoid value to 0 or less. Root privileges are required.
For the kernel modules resolution, make sure you have enough permissions to read kernel symbols information from the /proc/kallsyms file.
To check the data collection type used for your analysis:
Scroll down to the Collection and Platform Info section in the Summary window and check the Collector Type value:
To use only driverless collection, where possible:
Create a custom analysis and select the Enable driverless collection option.
From the command line, use the -knob enable-driverless-collection=true option. For example:
amplxe-cl -collect-with runsa -knob enable-stack-collection=true -knob stack-size=0 -knob enable-user-tasks=true -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=2000000, INST_RETIRED.ANY:sa=2000000, -knob enable-driverless-collection=true
To disable the driverless collection for your analysis:
Create a custom analysis and set the Stack size option to 0 (unlimited) value.
From the command line, use the -knob stack-size=0 option. For example:
amplxe-cl -collect-with runsa -knob enable-stack-collection=true -knob stack-size=0 -knob enable-user-tasks=true -knob event-config=CPU_CLK_UNHALTED.THREAD:sa=2000000,INST_RETIRED.ANY:sa=2000000This option disables the Perf driverless collection for stacks and enables the VTune Amplifier driver-based collection instead.
Perf-based driverless collection is applicable to all hardware event-based sampling analysis types, such as Hotspots (hardware event-based sampling mode), Microarchitecture Exploration, and Custom event-based sampling analysis types on Linux and Android OS. If the uncore events support is available on the system, the VTune Amplifier also uses the Perf collection for Memory Access, HPC Performance Characterization, and Microarchitecture Exploration analysis types with the Analyze memory bandwidth option enabled.
The following additional limitations are also possible for the driverless collection:
Since the driverless collection is based on the Linux Perf functionality, all Perf limitations fully apply to the VTune Amplifier sampling analysis as well. For example, your operating system limits on the maximum amount of files opened by a process as well as maximum memory mapped to a process address space still apply and may affect Perf-based profiling. For more information, see the Tutorial: Troubleshooting and Tips topic at https://perf.wiki.kernel.org/index.php/Main_Page.
Local and remote Launch Application, Attach to Process and Profile System target types are supported but this support fully depends on the Linux Perf profiling credentials specified in the /proc/sys/kernel/perf_event_paranoid file and managed by the administrator of your system using root credentials. For more information, see Perf Events and tool security at https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html and the perf_event related configuration files topic at http://man7.org/linux/man-pages/man2/perf_event_open.2.html. By default, only user processes profiling at both user and kernel spaces is permitted, so you need granting wider profiling credentials via the perf_event_paranoid file to employ the Profile System target type.
Memory bandwidth analysis is not supported on Intel Atom® processors.
Preemption and synchronization context switches may not be differentiated on kernels older than 4.17. To identify context switch types, make sure the VTune Amplifier sampling driver is loaded and the Stack size option is set to 0.
Run the <install-dir>/bin64/amplxe-self-checker.sh script to explore the analysis type collection abilities of your system. The script output helps recognize limitations and provides advice on fixing them.