How to run openvino GPU inference on E3950?

How to run openvino GPU inference on E3950?


Hello,

I can run openvino CPU inference on E3950, but GPU inference is not working. (ubuntu 16.04)

OPENCL supports opencl 1.2, but openvino needs 2.1. 

 

[ INFO ] InferenceEngine:
    API version ............ 2.1
    Build .................. custom_releases/2019/R3_ac8584cb714a697a12f1f30b7a3b78a5b9ac5e05
    Description ....... API
[ INFO ] Files were added: 1
[ INFO ]     /opt/intel/openvino/deployment_tools/demo/car_1.bmp
[ INFO ] Loading device GPU
[ ERROR ] Failed to create plugin /opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNNPlugin.so for device GPU
Please, check your environment
Cannot load library '/opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNNPlugin.so': /opt/intel/openvino_2019.3.376/deployment_tools/inference_engine/lib/intel64/libclDNN64.so: symbol clCreateCommandQueueWithProperties, version OPENCL_2.0 not defined in file libOpenCL.so.1 with link time reference

 

clinfo: /usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so.1: no version information available (required by clinfo)
Number of platforms                               1
  Platform Name                                   Intel(R) OpenCL HD Graphics
  Platform Vendor                                 Intel(R) Corporation
  Platform Version                                OpenCL 1.2
  Platform Profile                                FULL_PROFILE
  Platform Extensions                             cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing
  Platform Extensions function suffix             INTEL

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 NEO
  Driver Version                                  19.04.12237
  Device OpenCL C Version                         OpenCL C 1.2
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE
  Max compute units                               18
  Max clock frequency                             650MHz
  Device Partition                                (core)
    Max number of sub-devices                     0
    Supported partition types                     None
  Max work item dimensions                        3
  Max work item sizes                             256x256x256
  Max work group size                             256
  Preferred work group size multiple              32
  Preferred / native vector sizes                 
    char                                                16 / 16      
    short                                                8 / 8       
    int                                                  4 / 4       
    long                                                 1 / 1       
    half                                                 8 / 8        (cl_khr_fp16)
    float                                                1 / 1       
    double                                               1 / 1        (cl_khr_fp64)
  Half-precision Floating-point support           (cl_khr_fp16)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Single-precision Floating-point support         (core)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  Yes
  Double-precision Floating-point support         (cl_khr_fp64)
    Denormals                                     Yes
    Infinity and NANs                             Yes
    Round to nearest                              Yes
    Round to zero                                 Yes
    Round to infinity                             Yes
    IEEE754-2008 fused multiply-add               Yes
    Support is emulated in software               No
    Correctly-rounded divide and sqrt operations  No
  Address bits                                    32, Little-Endian
  Global memory size                              3178455040 (2.96GiB)
  Error Correction support                        No
  Max memory allocation                           1589227520 (1.48GiB)
  Unified memory for Host and Device              Yes
  Minimum alignment for any data type             128 bytes
  Alignment of base address                       1024 bits (128 bytes)
  Global Memory cache type                        Read/Write
  Global Memory cache size                        131072
  Global Memory cache line                        64 bytes
  Image support                                   Yes
    Max number of samplers per kernel             16
    Max size for 1D images from buffer            99326720 pixels
    Max 1D or 2D image array size                 2048 images
    Base address alignment for 2D image buffers   4 bytes
    Pitch alignment for 2D image buffers          4 bytes
    Max 2D image size                             16384x16384 pixels
    Max 3D image size                             16384x16384x2048 pixels
    Max number of read image args                 128
    Max number of write image args                128
  Local memory type                               Local
  Local memory size                               65536 (64KiB)
  Max constant buffer size                        1589227520 (1.48GiB)
  Max number of constant args                     8
  Max size of kernel argument                     1024
  Queue properties                                
    Out-of-order execution                        Yes
    Profiling                                     Yes
  Prefer user sync for interop                    Yes
  Profiling timer resolution                      52ns
  Execution capabilities                          
    Run OpenCL kernels                            Yes
    Run native kernels                            No
    SPIR versions                                 1.2
  printf() buffer size                            4194304 (4MiB)
  Built-in kernels                                block_motion_estimate_intel;block_advanced_motion_estimate_check_intel;block_advanced_motion_estimate_bidirectional_check_intel;
  Motion Estimation accelerator version    (Intel)   2
  Device Available                                Yes
  Compiler Available                              Yes
  Linker Available                                Yes
  Device Extensions                               cl_khr_3d_image_writes cl_khr_byte_addressable_store cl_khr_fp16 cl_khr_depth_images cl_khr_global_int32_base_atomics cl_khr_global_int32_extended_atomics cl_khr_icd cl_khr_image2d_from_buffer cl_khr_local_int32_base_atomics cl_khr_local_int32_extended_atomics cl_intel_subgroups cl_intel_required_subgroup_size cl_intel_subgroups_short cl_khr_spir cl_intel_accelerator cl_intel_media_block_io cl_intel_driver_diagnostics cl_intel_device_side_avc_motion_estimation cl_khr_priority_hints cl_khr_throttle_hints cl_khr_create_command_queue cl_khr_fp64 cl_intel_planar_yuv cl_intel_packed_yuv cl_intel_motion_estimation cl_intel_advanced_motion_estimation cl_intel_va_api_media_sharing

NULL platform behavior
  clGetPlatformInfo(NULL, CL_PLATFORM_NAME, ...)  No platform
  clGetDeviceIDs(NULL, CL_DEVICE_TYPE_ALL, ...)   No platform
  clCreateContext(NULL, ...) [default]            No platform
  clCreateContext(NULL, ...) [other]              Success [INTEL]
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_GPU)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ACCELERATOR)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_CUSTOM)  No platform
  clCreateContextFromType(NULL, CL_DEVICE_TYPE_ALL)  No platform

 

Is there another way to activate GPU inference?

thanks.

8 posts / 0 new

Looks like the issue may have to do with some CUDA 8.0 installation

/usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so

Do you also have an NVIDIA GPU?

How many libOpenCL.so in the system?

Cheers,

nikos

 

 


FWIW E3950 has an HD505 GPU which was fine when I run OpenVino on GPU a while ago.

HD505 also has   Out-of-order execution                        Yes

so should be fine if you fix OpenCl env.


Quote:

nikos wrote:

Looks like the issue may have to do with some CUDA 8.0 installation

/usr/local/cuda-8.0/targets/x86_64-linux/lib/libOpenCL.so

Do you also have an NVIDIA GPU?

How many libOpenCL.so in the system?

Cheers,

nikos

 

 

 

Thanks for your reply.

Yes, I have NVIDA GPU also, but I think maybe the key problem is OPENCL verison on E3950 is only 1.2.

How to update OPENCL to 2.0 or 2.1?


Need to try.. I have a E3950 system I can try at some point but probably will be over the weekend as I am busy on other projects.

Will update here as soon as I set it up.

Cheers,

nikos 


FWIW clDNN 

clDNN supports Intel® HD Graphics and Intel® Iris® Graphics and is optimized for Intel® HD Graphics 505

based on

https://github.com/intel/clDNN

---

Codename Skylake:

Intel® HD Graphics 510 (GT1, client market)

Intel® HD Graphics 515 (GT2, client market)

Intel® HD Graphics 520 (GT2, client market)

Intel® HD Graphics 530 (GT2, client market)

Intel® Iris® Graphics 540 (GT3e, client market)

Intel® Iris® Graphics 550 (GT3e, client market)

Intel® Iris® Pro Graphics 580 (GT4e, client market)

Intel® HD Graphics P530 (GT2, server market)

Intel® Iris® Pro Graphics P555 (GT3e, server market)

Intel® Iris® Pro Graphics P580 (GT4e, server market)

Codename Apollolake:

Intel® HD Graphics 500

Intel® HD Graphics 505

 


Best Reply

Hi, Just to update here, run on my E3950 the latest OpenVino release, benchmarked an SSD FP16 and no issues on GPU device.

The CPU device run much slower (as expected on this slow Atom) than -d GPU

My clinfo shows OpenCL 1.2 as in your case so this not an issue. 

I believe you have an issue with multi GPU OpenCL installation - please get the Intel OpenCL so in your path and try again. 

Cheers,

nikos

  Platform Name                                   Intel(R) OpenCL HD Graphics
Number of devices                                 1
  Device Name                                     Intel(R) Gen9 HD Graphics NEO
  Device Vendor                                   Intel(R) Corporation
  Device Vendor ID                                0x8086
  Device Version                                  OpenCL 1.2 NEO 
  Driver Version                                  19.13.12717
  Device OpenCL C Version                         OpenCL C 1.2 
  Device Type                                     GPU
  Device Profile                                  FULL_PROFILE

 

[ INFO ] Loading Inference Engine [ INFO ] Device info: GPU clDNNPlugin version ......... 2.1 Build ........... 32974

Count:      3368 iterations
Duration:   60080.1 ms
Latency:    68.2936 ms
Throughput: 56.0585 FPS
	Device: GPU
	Metrics: 
		AVAILABLE_DEVICES : [  ]
		SUPPORTED_METRICS : [ AVAILABLE_DEVICES SUPPORTED_METRICS FULL_DEVICE_NAME OPTIMIZATION_CAPABILITIES SUPPORTED_CONFIG_KEYS NUMBER_OF_WAITING_INFER_REQUESTS NUMBER_OF_EXEC_INFER_REQUESTS RANGE_FOR_ASYNC_INFER_REQUESTS RANGE_FOR_STREAMS ]
		FULL_DEVICE_NAME : Intel(R) Gen9 HD Graphics
		OPTIMIZATION_CAPABILITIES : [ FP32 BIN FP16 ]
		SUPPORTED_CONFIG_KEYS : [ CLDNN_INT8_ENABLED CLDNN_MEM_POOL CLDNN_PLUGIN_PRIORITY CLDNN_PLUGIN_THROTTLE DUMP_KERNELS DYN_BATCH_ENABLED EXCLUSIVE_ASYNC_REQUESTS GPU_THROUGHPUT_STREAMS PERF_COUNT TUNING_MODE ]
		NUMBER_OF_WAITING_INFER_REQUESTS : 0
		NUMBER_OF_EXEC_INFER_REQUESTS : 0
		RANGE_FOR_ASYNC_INFER_REQUESTS : { 1, 2, 1 }
		RANGE_FOR_STREAMS : { 1, 2 }
	Default values for device configuration keys: 
		CLDNN_INT8_ENABLED : NO
		CLDNN_MEM_POOL : YES
		CLDNN_PLUGIN_PRIORITY : 0
		CLDNN_PLUGIN_THROTTLE : 0
		DUMP_KERNELS : NO
		DYN_BATCH_ENABLED : NO
		EXCLUSIVE_ASYNC_REQUESTS : NO
		GPU_THROUGHPUT_STREAMS : 1
		PERF_COUNT : NO
		TUNING_MODE : TUNING_DISABLED

 


Just to add some more information - here is the GPU load when running E3950 HD505 GPU inference (using command intel_gpu_top)

Let us know if you need any more help yo resolve the OpenCL env issue.

                   render busy:  94%: ██████████████████▉                    render space: 122/16384
                          task  percent busy
                            CS:  94%: ██████████████████▉     vert fetch: 0 (0/sec)
                           GAM:  91%: ██████████████████▎     prim fetch: 0 (0/sec)
                           TSG:  88%: █████████████████▋   VS invocations: 0 (0/sec)
                           VFE:  80%: ████████████████     GS invocations: 0 (0/sec)
                          GAFS:  10%: ██                        GS prims: 0 (0/sec)
                           TDG:   5%: █                    CL invocations: 0 (0/sec)
                            SF:   1%: ▎                         CL prims: 0 (0/sec)
                            VS:   1%: ▎                    PS invocations: 0 (0/sec)
                          URBM:   1%: ▎                    PS depth pass: 0 (0/sec)
                           SVG:   0%:                      
                            VF:   0%:                      
                            CL:   0%:                      
                           SDE:   0%:                      
                          GAFM:   0%:    

 

 

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today