You would expect to pay the cost of copying data in and out of threadprivate at the beginning and end of each parallel region, which could be substantially more than the bare allocation cost. I don't know how the out-dated compiler would affect this, except that it doesn't include the current version of OpenMP library.
Dear tim, thank you for your response, I haven't set any "options" to activate the copy of threadprivate variables at the end of parallel region. In fact, the threadprivate variables are just a set of "temporary variables" shared across some subroutines which don't need to be used at the end of the parallel region. In this situation, are there default copies at the end of a parallel region ?
This way of coding could be discussed. In fact, i try to parrallelize some old big Fortran codes with a minimum of changes in the structure.