I am profiling a highly optimized code using VTune XE.
To do this I run the hotspot and cuncurrency analysis types.
When opening the bottom-up view I can see that a substantial portion of time (about 20% of CPU time) is spent in libiomp5.so, which is called from clone->start_thread, from the libiomp5 module.
I also see libiomp5 in the inner functions called from the module that corresponds to my code.
It is my understanding that the libiomp5.so in the inner functions is the time spent in clone threads of these functions. Is this correct ?
More importantly, does the CPU time reported in the libiomp5 module relate to threading overhead ? (Thread synchronization, Thread pool instantiation)