The reproducer is attached.
In it, I am enqueuing 1000 instances of the same kernel, each subsequent instance being made dependent on the previous one. Then I just wait for the event associated with the last kernel enqueued. The kernel just zeroes out a buffer. This issue doesn't reproduce on a no-op kernel.
Expected result: application finishes successfully.
Actual result: application hangs.
I get the expected result on Intel CPU and NVidia GPU devices, but a hang on Intel HD 4600 GPU.