100% CPU usage when using OpenVINO GPU inference

100% CPU usage when using OpenVINO GPU inference

We are running YOLOv3 model inference using OpenVINO, and see 100% CPU usage when using `-d GPU`. This is reproducible with the official object_detection_demo_yolov3_async example code.

Checking the per-layer performance yields this:

performance counts:

detector/darknet-53/Conv/C... EXECUTED       layerType: Convolution        realTime: 1949       cpu: 7              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12948               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 4964       cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12923               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 1059       cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12921               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 4818       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12891               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add       EXECUTED       layerType: Eltwise            realTime: 7393       cpu: 6              execType: generic_eltwise_ref
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 4794       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12936               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_5... EXECUTED       layerType: Convolution        realTime: 620        cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12915               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_6... EXECUTED       layerType: Convolution        realTime: 4848       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12951               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_1     EXECUTED       layerType: Eltwise            realTime: 905        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_7... EXECUTED       layerType: Convolution        realTime: 600        cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12907               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_8... EXECUTED       layerType: Convolution        realTime: 4767       cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_                    NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_2     EXECUTED       layerType: Eltwise            realTime: 3557       cpu: 6              execType: generic_eltwise_ref
detector/darknet-53/Conv_9... EXECUTED       layerType: Convolution        realTime: 4914       cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12894               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 571        cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12906               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 4776       cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12896               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_3     EXECUTED       layerType: Eltwise            realTime: 375        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 583        cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12913               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 4788       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12918               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_4     EXECUTED       layerType: Eltwise            realTime: 337        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 575        cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12884               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 4857       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12933               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_5     EXECUTED       layerType: Eltwise            realTime: 341        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 584        cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12909               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 4734       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12902               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_6     EXECUTED       layerType: Eltwise            realTime: 335        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 604        cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12934               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_1... EXECUTED       layerType: Convolution        realTime: 4812       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12927               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_7     EXECUTED       layerType: Eltwise            realTime: 337        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 601        cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12901               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 4837       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12929               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_8     EXECUTED       layerType: Eltwise            realTime: 336        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 595        cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12881               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 4834       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12920               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_9     EXECUTED       layerType: Eltwise            realTime: 342        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 568        cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12882               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 4752       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12945               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_10    EXECUTED       layerType: Eltwise            realTime: 1859       cpu: 5              execType: generic_eltwise_ref
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 5298       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12911               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 692        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12899               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 5227       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12903               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_11    EXECUTED       layerType: Eltwise            realTime: 125        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_2... EXECUTED       layerType: Convolution        realTime: 676        cpu: 6              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12912               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 5349       cpu: 6              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12939               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_12    EXECUTED       layerType: Eltwise            realTime: 128        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 678        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12885               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 5248       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12922               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_13    EXECUTED       layerType: Eltwise            realTime: 124        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 678        cpu: 6              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12950               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 5221       cpu: 6              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12914               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_14    EXECUTED       layerType: Eltwise            realTime: 123        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 672        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12916               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 5285       cpu: 6              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12910               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_15    EXECUTED       layerType: Eltwise            realTime: 139        cpu: 11             execType: eltwise_simple_vload8
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 675        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12942               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 5212       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12935               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_16    EXECUTED       layerType: Eltwise            realTime: 137        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_3... EXECUTED       layerType: Convolution        realTime: 674        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12898               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 5194       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12925               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_17    EXECUTED       layerType: Eltwise            realTime: 130        cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 674        cpu: 6              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12886               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 5285       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12892               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_18    EXECUTED       layerType: Eltwise            realTime: 928        cpu: 5              execType: generic_eltwise_ref
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 5513       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12941               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 772        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12897               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 5244       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12930               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_19    EXECUTED       layerType: Eltwise            realTime: 77         cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 769        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12887               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 5469       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12943               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_20    EXECUTED       layerType: Eltwise            realTime: 70         cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 791        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12928               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_4... EXECUTED       layerType: Convolution        realTime: 5227       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12931               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_21    EXECUTED       layerType: Eltwise            realTime: 71         cpu: 5              execType: eltwise_simple_vload8
detector/darknet-53/Conv_5... EXECUTED       layerType: Convolution        realTime: 767        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12890               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/Conv_5... EXECUTED       layerType: Convolution        realTime: 5211       cpu: 9              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12895               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/darknet-53/add_22    EXECUTED       layerType: Eltwise            realTime: 71         cpu: 5              execType: eltwise_simple_vload8
detector/yolo-v3/Conv/Conv2D  EXECUTED       layerType: Convolution        realTime: 835        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12905               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_1/Co... EXECUTED       layerType: Convolution        realTime: 5198       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12938               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_2/Co... EXECUTED       layerType: Convolution        realTime: 790        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12944               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_3/Co... EXECUTED       layerType: Convolution        realTime: 5229       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12888               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_4/Co... EXECUTED       layerType: Convolution        realTime: 770        cpu: 8              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12917               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_5/Co... EXECUTED       layerType: Convolution        realTime: 5323       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
detector/yolo-v3/Conv_7/Co... EXECUTED       layerType: Convolution        realTime: 254        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12900               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
LeakyReLU_12940               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_6/Co... EXECUTED       layerType: Convolution        realTime: 117        cpu: 7              execType: convolution_gpu_bfyx_os_iyx_osv16
detector/yolo-v3/ResizeNea... EXECUTED       layerType: Resample           realTime: 73         cpu: 5              execType: undef
detector/yolo-v3/concat_3     EXECUTED       layerType: Concat             realTime: 602        cpu: 5              execType: concatenation_gpu_ref
detector/yolo-v3/Conv_6/Bi... EXECUTED       layerType: RegionYolo         realTime: 89         cpu: 6              execType: region_yolo_gpu_ref
detector/yolo-v3/Conv_8/Co... EXECUTED       layerType: Convolution        realTime: 997        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12946               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_9/Co... EXECUTED       layerType: Convolution        realTime: 5297       cpu: 6              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12937               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_10/C... EXECUTED       layerType: Convolution        realTime: 678        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12919               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_11/C... EXECUTED       layerType: Convolution        realTime: 5213       cpu: 7              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12949               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_12/C... EXECUTED       layerType: Convolution        realTime: 706        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12883               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_13/C... EXECUTED       layerType: Convolution        realTime: 5305       cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
detector/yolo-v3/Conv_15/C... EXECUTED       layerType: Convolution        realTime: 223        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
LeakyReLU_12932               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
LeakyReLU_12947               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_14/C... EXECUTED       layerType: Convolution        realTime: 112        cpu: 6              execType: convolution_gpu_bfyx_os_iyx_osv16
detector/yolo-v3/ResizeNea... EXECUTED       layerType: Resample           realTime: 119        cpu: 10             execType: undef
detector/yolo-v3/concat_7     EXECUTED       layerType: Concat             realTime: 1164       cpu: 5              execType: concatenation_gpu_ref
detector/yolo-v3/Conv_14/B... EXECUTED       layerType: RegionYolo         realTime: 77         cpu: 5              execType: region_yolo_gpu_ref
detector/yolo-v3/Conv_16/C... EXECUTED       layerType: Convolution        realTime: 891        cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12908               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_17/C... EXECUTED       layerType: Convolution        realTime: 4747       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12926               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_18/C... EXECUTED       layerType: Convolution        realTime: 567        cpu: 6              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12924               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_19/C... EXECUTED       layerType: Convolution        realTime: 4743       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12889               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_20/C... EXECUTED       layerType: Convolution        realTime: 560        cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12893               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_21/C... EXECUTED       layerType: Convolution        realTime: 5016       cpu: 5              execType: convolution_gpu_bfyx_gemm_like
LeakyReLU_12904               NOT_RUN        layerType: ReLU               realTime: 0          cpu: 0              execType: undef
detector/yolo-v3/Conv_22/C... EXECUTED       layerType: Convolution        realTime: 231        cpu: 5              execType: convolution_gpu_bfyx_os_iyx_osv16
detector/yolo-v3/Conv_22/B... EXECUTED       layerType: RegionYolo         realTime: 97         cpu: 5              execType: region_yolo_gpu_ref
Total time: 233168   microseconds
[ INFO ] Execution successful

We profiled the application, and it seems like there is a lot of usage in the OpenCL library:

 

We then tried running on an NCS2 (-d MYRIAD), and it works great, with lower CPU usage (~20%). 

It almost seems that there is some sort of a busy-wait in the GPU case. Any suggestions would be highly appreciated.

OpenVino toolkit version: 2019 R1. Also reproducible on R3.

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Mohammed,

Thanks for reaching out, could you share a link to the model that you are using and the model optimizer command used to convert to IR format? I would like to run some tests from my end. Also, could you share some details of the hardware you are testing on?

You mentioned the issue is also seen on the OpenVINO toolkit 2019 R3, have you tested on 2019 R3.1?

Regards,

Jesus

 

IR and models sent via PM.

These were generated with instructions from here: https://docs.openvinotoolkit.org/latest/_docs_MO_DG_prepare_model_conver...

Exact commands:

# Convert Darknet weights to TensorFlow weights
python3 convert_weights_pb.py --class_names barcode.names --data_format NHWC --weights_file barcode.weights

# Convert frozen tensorflow weights to Inference Engine IR
mo_tf.py \
--input_model frozen_darknet_yolov3_model.pb \
--data_type FP16 \
--batch 1 \
--tensorflow_use_custom_operations_config yolo_v3.json

If you need inference test data for the above model, I can privately email/message it to you.

We tried running on a bunch of platforms, including i7-7600U and Xeon E3-1505M v5. Reproducible on both.

Thanks for the help! We currently are using an NCS 2 to avoid high CPU load, but would love to go back to the integrated graphics once this issue is solved. I'll give it a try on R3.1 now.

Regards,

Kabir

 

 

Hi Mohammed,

I was able to reproduce the issue with the information you provided. I am reaching out to the development team for additional input.

Regards,

Jesus

Hi Mohameed,

Thank you for your patience, I got some feedback from the development team. 

This is a known issue with the Intel GPU. If CPU usage is an issue for your application and you don't mind sacrificing some wait time, the problem can be mitigated by setting the GPU plugin config key KEY_CLDNN_PLUGIN_THROTTLE to lower value 1. This will cause the driver polling thread to periodically sleep and preempt, removing most of the overhead.

The plugin configuration parameters need to be set before calling the IE LoadNetwork.

#include <cldnn/cldnn_config.hpp>

ie.SetConfig({ { CLDNNConfigParams::KEY_CLDNN_PLUGIN_THROTTLE, "1" } });

Regards,

Jesus

Thanks, that resolved it!

It would be great if the documentation could be updated with far greater emphasis on this pitfall.

Regards,

Kabir

Hi Kabir,

Thank you for confirming, glad it's working for you!

Regards,

Jesus

Leave a Comment

Please sign in to add a comment. Not a member? Join today