Developer Guide

How to Use the Intercept Layer for OpenCL™ Applications

Linux* and OS X*: Linux OS X Build Status | Windows*: Windows Build Status
The Intercept Layer for OpenCL Applications is a tool that can intercept and modify OpenCL calls for debugging and performance analysis. Using the Intercept Layer for OpenCL Applications requires no application or driver modifications.
To operate, the Intercept Layer for OpenCL Applications masquerades as the OpenCL ICD loader (usually) or as an OpenCL implementation (rarely) and is loaded when the application intends to load the real OpenCL ICD loader. As part of the Intercept Layer for OpenCL Application’s initialization, it loads the real OpenCL ICD loader and gets function pointers to the real OpenCL entry points. Then, whenever the application makes an OpenCL call, the call is intercepted and can be passed through to the real OpenCL with or without changes.
To access the OpenCL Intercept Layer git:
git clone https://github.com/intel/opencl-intercept-layer
See intercept documentation for information about controls.
To run use the following setup:
export CLI_OpenCLFileName=/opt/intel/inteloneapi/compiler/latest/linux/lib/libOpenCL.so.1 export LD_LIBRARY_PATH=/home/opencl-intercept-layer/build/intercept:$LD_LIBRARY_PATH export SYCL_BE=PI_OPENCL CLI_ReportToStderr=0 CLI_ReportToFile=1 CLI_HostPerformanceTiming=1 CLI_DevicePerformanceTiming=1 CLI_DumpDir=. ./matrix.dpcpp
This will generate a file called
cliintercept_report.txt
. The file will include the following data and tables shown below.
  • Total Enqueues: 2
  • Total Time (ns): 1604325652
Host Performance Timing Results
Function Name
Calls
Time (ns)
Time (%)
Average (ns)
Min (ns)
Max (ns)
clBuildProgram
1
337069812
21.01%
337069812
337069812
337069812
clCreateBuffer
3
3393909
0.21%
1131303
140325
2036170
clCreateCommandQueue
WithProperties
1
5221
0.00%
5221
5221
5221
clCreateContext
1
33639
0.00%
33639
33639
33639
clCreateKernel
1
11713
0.00%
11713
11713
11713
clCreateProgramWithIL
1
153337
0.01%
153337
153337
153337
clEnqueueNDRangeKernel
(
_ZTS9Matrix1_2IfE
)
3
3102488
0.19%
3102488
3102488
3102488
clEnqueueReadBufferRect
1
1099684
0.07%
1099684
1099684
1099684
clGetContextInfo
8
4720
0.00%
590
160
1997
clGetDeviceIDs
12
53004
0.00%
4417
504
14853
clGetDeviceInfo
30
85695
0.01%
2856
133
19920
clGetExtensionFunction
AddressForPlatform
3
6446
0.00%
2148
1317
3687
clGetKernelInfo
2
716
0.00%
358
169
547
clGetPlatformIDs
2
1198290216
74.69%
599145108
715
1198289501
clGetPlatformInfo
12
22538
0.00%
1878
404
7326
clReleaseCommandQueue
1
1744
0.00%
1744
1744
1744
clReleaseContext
1
331
0.00%
331
331
331
clReleaseDevice
6
6365
0.00%
1060
491
1352
clReleaseEvent
2
2398
0.00%
1199
992
1406
clReleaseKernel
1
2733
0.00%
2733
2733
2733
clReleaseMemObject
3
45464
0.00%
15154
10828
22428
clReleaseProgram
1
51380
0.00%
51380
51380
51380
clRetainDevice
6
8680
0.00%
1446
832
2131
clSetKernelArg
20
6976
0.00%
348
180
1484
clSetKernelExecInfo
3
1588
0.00%
529
183
1149
clWaitForEvents
6
60864855
3.79%
10144142
928
60855555
Device Performance Timing Results for Intel(R) Gen9 HD Graphics NEO (24CUs, 1200MHz)
Function Name
Calls
Time (ns)
Time (%)
Average (ns)
Min (ns)
Max (ns)
_ZTS9Matrix1_2IfE
1
58691515
99.98%
58691515
58691515
58691515
clEnqueueReadBufferRect
1
13390
0.02%
13390
13390
13390
The report includes detailed timing data on both your host and device.

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.