When OpenCL application is profiled with VTune Amplifier (Excellent product. I'm in love with it ) it shows parts of the intel OpenCL internals. It seems that the function opencl_snprintf takes quite a lot of cpu time to run (no printf is used on our kernels).
As an example our most used kernel takes 39 seconds of CPU time to run (very, very many invocations within a program with new arguments set for every call) and opencl_snprintf uses 28 seconds on a simple test run. I'd wish to know why snprintf is used when spawning tasks or in other internal operation. Is there internal logging going on or what is the purpose of it? Or is it just VTune misreporting the CPU time/instructions depening on measurement mode? If snprintf actually takes this much time to run can we expected that it will be optimized away eventually?


