Dear Intel forumers,
I have recently introduced THREADPRIVATE statements in some of my Fortran commons in order to make the variables contained in the commonsprivatefor each thread. These commons contain quite large variables. Dynamic threading is set to OFF and i use same number of threads in all parallel regions.
I noticed that, when i set the number of thread to 1, the performances of the program with/without the threadprivate attribute are very different. (30-40 % slower with threadprivate attributes when running optimized (O2) version of the code).
I have carefully read the documentation andI don't really understand why. In these conditions, I thought the memory was allocated once when i first used the threadprivate common in the first parallel region and stay "alive" during all the program execution. So, the "cost" of the threadprivate should be only a one shot at the beginning. It seems not to be the case. Could you tell me more about it? Are there options to optimize the use of threadprivate statements?
I use Intel Fortran Windows Compiler 10.0.0.27.
Thank you for your Help,