Enqueing many kernels leads to a hang on HD 4600

The reproducer is attached.

In it, I am enqueuing 1000 instances of the same kernel, each subsequent instance being made dependent on the previous one. Then I just wait for the event associated with the last kernel enqueued. The kernel just zeroes out a buffer. This issue doesn't reproduce on a no-op kernel.

Expected result: application finishes successfully.

Actual result: application hangs.

I get the expected result on Intel CPU and NVidia GPU devices, but a hang on Intel HD 4600 GPU.

N-Body Simulation Project at Cal Poly

The goal of the N-Body problem is to predict the motion of a set of n objects interacting with each other by some force, e.g. the gravitational force. N-Body simulations have been used in particles simulation such as astrophysical and molecular dynamics simulations. There are a number of approaches for solving the N-Body problem, such as the Barnes-Hut algorithm, the Fast Multipole method, the Parallel Multipole Tree Algorithm method and the direct calculation method. In the direct calculation approach, for every particle in the system, the force between that particle and every other particle in the system is calculated, resulting in an O(n2) operation for each time step where n is the number of particles in the system.

Unhandled exception IntelOpenCLProfiler.dll


I have just installed OpenCL Code builder on top of Visual 2013 and followed the user manual steps to run the OCL Kernel development and Application analysis. However, I have trouble debugging with OpenCL API debugger : when debugging the template project in CPU-mode, it stops at this line in FindOpenCLPlatform function of OpenCLProjectCodeBuilder.cpp  :

    err = clGetPlatformIDs(0, NULL, &numPlatforms);

with the following message :

Use VTune Amplifier System 2016 for HelloOpenCL GPU Application Analysis

You are recommended to learn how to use VTune to perform a profiling first before reading this article. If you don’t know how to do it, you may refer the tutorial documents first to understand basics in VTune. VTune Amplifier 2016 for system can be also used to analyze OpenCL™ programs. This article is to show you how to use this function and also to create a simple OpenCL program HelloOpenCL via Microsoft Visual Studio & Intel OpenCL codebuilder.
  • 开发人员
  • Microsoft Windows* (XP, Vista, 7)
  • Microsoft Windows* 8.x
  • Windows*
  • C/C++
  • 中级
  • 英特尔® System Studio
  • OpenCL*
  • VTune 2016;OpenCL;ISS opencl;
  • 开发工具
  • 图形
  • 优化
  • 许可协议: 

    OpenCL driver / INDE issue on Win10

    I have a HP Z230 PC with an integrated Intel HD 4600 GPU.

    On this PC with Windows 7 64bit, my OpenCL program runs correctly. After upgrading to Windows 10 64bit, the call to clSetKernelArg fails and returns CL_INVALID_ARG_SIZE. No code changes on my side.

    On windows 10, the driver version is I tried to upgrade to the newest driver version ( The installation seems to succeed, but installs version again.

    MICRO48-Tutorial on Intel® Processor Graphics: Architecture and Programming

    In this tutorial, we will give an in-depth presentation of the architecture and micro-architecture of the media and graphics accelerator. We will explain the tradeoff between general purpose compute and hardware fixed functions. We will discuss the advantages and disadvantages of on-die integration. We will present the various programming models that are supported. We will present some examples of non-graphics workloads and discuss how they are mapped to hardware. The tutorial has four parts. Part one will focus on the micro architecture of Intel Processor Graphics, part two will present the system architecture, part three will discuss how to program it, and part four will present some examples.

    Multiple Map/Unmap buffer

    According to OpenCL 1.2 spec:

    ​clEnqueueMapBuffer and clEnqueueMapImage increment the mapped count of the memory object. The initial mapped count value of a memory object is zero. Multiple calls to clEnqueueMapBuffer or clEnqueueMapImage on the same memory object will increment this mapped count by appropriate number of calls. clEnqueueUnmapMemObject decrements the mapped count of the memory object.​

    订阅 OpenCL*