First of all, many thanks to the VTune team to implementing OpenCL GPU profiling to the tool.
I was trying out the tool on a command-line OpenCL application running on the HD 4000. I followed the documentation and was able to enable the GPU profiling support in VTune. I profiled my application and some metrics such as average execution time of the kernel, EU array busy and stalled work fine. However, some other metrics, such as memory bandwidth, still report 0.0. I have the latest HD 4000 driver installed with OpenCL 1.2 support. My application is not using DirectX, only OpenCL.
