Profiling a DPC++ Application running on a GPU
- Tools:Intel® VTune™- GPU Offload and GPU Compute/Media Hotspots Analyses.Profiler
- Starting with the 2020 release, Intel® VTune™ Amplifier has been renamed toIntel® VTune™.Profiler
- Most recipes in theIntel® VTune™Performance Analysis Cookbook are scalable. You can apply them to different versions ofProfilerIntel® VTune™. In some cases, minor adjustments may be required.Profiler
- Intel Processor Graphics Gen8, Gen 9, or Gen 11.
- Intel microarchitectures code name Kaby Lake, Coffee Lake, or Ice Lake.
- Operating system:
- Linux* OS, kernel version 4.14 or newer.
- Windows* 10 OS.
- Graphical User Interface for Linux:
- GTK+ (2.10 or higher. ideally, use 2.18 or higher)
- Pango (1.14 or higher)
- X.Org (1.0 or higher, ideally use 1.7 or higher)
Build and Compile a DPC++ Application
- Go to the sample directory.cd <sample_dir>/VtuneProfiler/matrix_multiply_vtune
- Themultiply.cppfile in thesrcdirectory contains several DPC++ versions of matrix multiplication. Select a version by editing the corresponding#define MULTIPLYline inmultiply.hpp.
- Compile your sample DPC++ application:cmake . makeThis generates amatrix.dpcppexecutable.To delete the program, type:make cleanThis removes the executable and object files that were created by themakecommand.
- Open the sample directory:<sample_dir>\VtuneProfiler\matrix_multiply_vtune
- In this directory, open a Visual Studio* project file namedmatrix_multiply.sln
- Themultiply.cppfile contains several DPC++ versions of matrix multiplication. Select a version by editing the corresponding#define MULTIPLYline inmultiply.hpp
- Build the entire project with a Release configuration.This generates an executable calledmatrix_multiply.exe.
Run GPU Offload Analysis on a DPC++ Application
- LaunchVTuneand clickProfilerNew Projectfrom the Welcome page.TheCreate a Projectdialog box opens.
- Specify a project name and a location for your project and clickCreate Project.TheConfigure Analysiswindow opens.
- Make sure theLocal Hostis selected in theWHEREpane.
- In theWHATpane, make sure theLaunch Applicationtarget is selected and specify thematrix_multiplybinary as anApplicationto profile.
- In theHOWpane, selectGPU Offloadanalysis type from theAcceleratorsgroup.This is the least intrusive analysis for applications running on platforms with Intel Graphics as well as on other third-party GPUs supported byVTune.Profiler
- Click theStartbutton to launch the analysis.
- On Linux OS:
- SetVTuneenvironment variables by exporting the script:Profilerexport <install_dir>/env/vars.sh
- Run the analysis command:vtune -collect -gpu-offload -- ./matrix.dpcpp
- On Windows OS:
- SetVTuneenvironment variables by running the batch file:Profilerexport <install_dir>\env\vars.bat
- Run the analysis command:vtune.exe -collect gpu-offload -- matrix_multiply.exe
Analyze Collected Data
GPU Bound Applications
CPU Bound Applications
The GPU is busy for a majority of the profiling time.
The CPU is busy for a majority of the profiling time.
There are small idle gaps between busy intervals.
There are large idle gaps between busy intervals.
The GPU software queue is rarely reduced to zero.
Run GPU Compute/Media Hotspots Analysis
- In theAcceleratorsgroup, select theGPU Compute/Media Hotspotsanalysis type.
- Configure analysis options as described in the previous section.
- Click theStartbutton to run the analysis.
- On Linux OS:vtune -collect gpu-hotspots -- ./matrix.dpcpp
- On Windows OS:vtune.exe -collect gpu-hotspots -- matrix_multiply.exe