I'm using OpenCL with Xeon Phi on Linux. The host code is executed on host operating system and the kernel code is executed on a Xeon Phi card. I wonder if there is any way to profile (cache misses, instructions, etc.) of the kernel code on Xeon Phi? I would expect something like
Can I do this with VTune™ Amplifier XE?
Thanks and regards,