Python* Code Analysis
Explore performance analysis options provided by the
Intel® VTune™
for Python* applications to identify the most time-consuming code sections and critical call paths.
Profiler
VTune
supports the
Profiler
Hotspots
,
Threading
, and
Memory Consumption
analysis for Python* applications via the
Launch Application
and
Attach to Process
modes. For example, when your application does excessive numerical modeling, you need to know how effectively it uses available CPU resources. A good example of the effective CPU usage is when the calculating process spends most time executing native extension and not interpreting Python glue code.
To get the maximum performance out of your Python application, consider using native extensions, such as NumPy or writing and compiling performance critical modules of your Python project in native languages, such as C or even assembly. This will help your application take advantage of vectorization and make complete use of powerful CPU resources.
To analyze the Python code performance with the
VTune
and interpret data:
Profiler
Configure Python Data Collection
You may use either GUI or command-line () interface to configure the
vtune
VTune
for analyzing the performance of your Python code.
Profiler
To configure and run Python code profiling from GUI, do the following:
- Click theConfigure Analysisbutton on the toolbar.The Configure Analysis window opens.
- Choose a target system and target type. For example:Local HostandLaunch Application.Only Windows* and Linux* target systems are supported.
- In theLaunch Applicationconfiguration pane, specify a path to the installed Python interpreter in theApplicationfield and a path to your Python script in theApplication parametersfield.If you specify a relative path to your Python script in theApplication parametersfield, theVTuneproperly resolves full function or method names only for the imported modules, and does not resolve the names inside the main script. Consider specifying the absolute path to the script.ProfilerIn addition, you may select theAutomanaged code profiling mode, and theVTuneautomatically detects the type of target executable, managed or native, and switches to the corresponding mode. Optionally, you may selectProfilerAnalyze child processesoption to collect data on processes launched by the target process. For example, on Linux your configuration may look like this:In case your Python application needs to run before the profiling starts or cannot be launched at the start of this analysis, you may attach theVTuneto the Python process. To do this, select theProfilerAttach to Processtarget type and specify the Python process name or PID as follows:When you attach theVTuneto the Python process, make sure you initialize the Global Interpreter Lock (GIL) inside your script before you start the analysis. If GIL is not initialized, theProfilerVTunecollector initializes it only when a new Python function is called.Profiler
- From theHOWconfiguration pane on the right, select theHotspots,Threading, orMemory Consumptionanalysis type.
- Configure the following options, if required, or use the defaults:User-Mode SamplingmodeSelect to enable the user-mode sampling and tracing collection for hotspots and call stack analysis (formerly known as Basic Hotspots). This collection mode uses a fixed sampling interval of 10ms. If you need to change the interval, click theCopybutton and create a custom analysis configuration.Hardware Event-Based SamplingmodeSelect to enable hardware event-based sampling collection for hotspots analysis (formerly known as Advanced Hotspots).You can configure the following options for this collection mode:
- CPU sampling interval, msto specify an interval (in milliseconds) between CPU samples. Possible values for thehardware event-based samplingmode are0.01-1000.1 msis used by default.
- Collect stacksto enable advanced collection of call stacks and thread context switches.
When changing collection options, pay attention to theOverheaddiagram on the right. It dynamically changes to reflect the collection overhead incurred by the selected options.Show additional performance insightscheck boxGet additional performance insights, such as vectorization, and learn next steps. This option collects additional CPU events, which may enable the multiplexing mode.The option is enabled by default.DetailsbuttonExpand/collapse a section listing the default non-editable settings used for this analysis type. If you want to modify or enable additional settings for the analysis, you need to create a custom configuration by copying an existing predefined configuration.VTunecreates an editable copy of this analysis type configuration.Profiler - Click theStartbutton to run the analysis.
Identifying Hotspots
Hotspots analysis in the
user-mode sampling
mode helps identify sections of your Python code that take a long time to execute (hotspots), along with their timing metrics and call stacks. It also displays the workload distribution over threads in the
Timeline pane.
By default, the
VTune
uses the
Profiler
Auto
managed code profiling mode, that enables you to view and analyze mixed stacks for Python/C++ applications. In the example below, you can see a native hotspot
Intel® oneAPI Math Kernel Library
(oneMKL
) function on the left pane. The mixed call stack analysis on the right pane reveals a Python
black_scholes
function that actually calls the hotspot function:

Double-click the
black_scholes
function on the
Call Stack
pane to open the source view on call site line 66:

To view call stacks only inside your Python code, filter out Python core and system functions by selecting
Only user functions
option for the
Call Stack Mode
on the filter bar.
Limitations
VTune
supports Python code profiling with some limitations:
Profiler
- Only Python distribution 2.6 and later are supported.
- If you use Python extensions that compile Python code to the native language (JIT, C/C++), theVTunemay show incorrect analysis results. Consider using JIT Profiling API to solve this problem.Profiler
- Python code profiling is supported for Windows and Linux target systems only.
- In some cases, theVTunemay not resolve full names of Python functions and modules on Windows OS. It displays correct source information, so you can view the source directly from theProfilerVTune's viewpoints.Profiler
- Proper thread names are not always displayed in the Timeline pane.
- If your application has very low stack depth, which includes called functions and imported modules, theVTunedoes not collect Python data. Consider using deeper calls to enable the profiling.Profiler
- When collecting data remotely, theVTunemay not resolve full function or method names, and display the source code of your Python script.ProfilerTo solve this problem for Linux targets, copy the source files to a directory on your host system with a path identical to the path on your target system before running the analysis.