Anybody profiling OpenMP code?

Anybody profiling OpenMP code?

I was testing VTune Amplifier XE on some OpenMP code and got a profile like this:

I found it strange that the time spent in the OpenMP parallel-for loops was not attributed to the function containing them, i.e., parallel_fors().  Have others experienced this and what did you think?

Downloadimage/png openmpdisplay.PNG180.5 KB
3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.


intel openmp always had this treatment of making a separate function for each parallel region. this often seems convenient in accounting separately serial and parallel time.  it was extra useful when there was an omp profile option.

parallel_for$omp$parallel_for@?? are functions in OMP library to complete the tasks, which were submitted by parallel_for in user's code. There is no caller-callee relationship, the reason is that parallel_for$omp$parallel_for@?? works in another thread, you should see same behavior when using Intel? Thread Building Block (TBB). Light CPU time is spent in user code, exact work is in parallel library.

Leave a Comment

Please sign in to add a comment. Not a member? Join today