I just wanted to know what are the case where the three threadprivate functions are called :
__kmpc_threadprivate_cached, __kmpc_threadprivate_register and __kmpc_threadprivate_register_vec.
If someone has an idea, i would be very interested.
Thank you in advance.
We are doing some experiments with the EPCC parallel benchmark on an Intel Xeon Phi coprocessor 7120 with 244 threads, compact affinity, hierarchical barrier, KMP_LIBRARY=turnaround, KMP_BLOCKTIME=infinit.
Using VTune, I see that most of the non-waiting time is consumed in the __kmp_hierarchical_barrier_release which makes sense to me. However, inside this function, most of the time is spent in:
I am trying to make a comparison statistics of offload using,
1). Intel compiler assisted offload VS. 2). OPENMP 4.0 target construct
My QUESTION: HOW I CAN GET OPENMP 4.0 OFFLOAD REPORT(which environment variable I need to set..?), I used OFFLOAD Report=2; intel compiler directive offload it worked fine, BUT I AM GETTING VERY STRANGE STATISTICS WITH OPENMP 4.0 OFFLOAD (I am using Intel Xeon Phi as execution platform)
Here is the code
COMPILER DIRECTIVE OFFLOAD:
// Start time
I am new to this community and, first of all, I would like to thank everyone for the help in advance.
I use intel fortran for my programs, which have always been coded using a sequential approach. Now I am trying to parallelize a few do loops but I have a run-time problem when I try to run a program with the openmp directives. The thing is that this the first time for me to try to implement using openmp, and I would like to apologize if my questions are stupid.