OpenMP issue: a threadprivate pointer doen't work when it has the dimension attribute

OpenMP issue: a threadprivate pointer doen't work when it has the dimension attribute

Hi,
I have a problem using OpenMP with the ifort compiler version 11.0.
Consider the following test program:

program test
  use mod_test, only: check
  implicit none
  integer,pointer,save :: p=>null()
  !$omp threadprivate(p)
  !$omp parallel
  allocate(p)
  call check(p)
  !$omp end parallel
end program test

The program uses this module:

module mod_test
  implicit none
contains
  subroutine check(num)
    integer,pointer :: num
    if (associated(num)) write (*,*) 'associated'
  end subroutine check
end module mod_test

On a 4-cpu machine this program prints the word 'associated' 4 times, as predicted.
If, however, the pointer p is given the 'dimension' attribute, only the first thread that executes the parallel region works as anticipated. In the last three threads the condition is evaluated as false. Here is the modified program that produces the error:

program test
  use mod_test, only: check
  implicit none
  integer,dimension(:),pointer,save :: p=>null()
  !$omp threadprivate(p)
  !$omp parallel
  allocate(p(1))
  call check(p)
  !$omp end parallel
end program test

The program uses this module:

module mod_test
  implicit none
contains
  subroutine check(num)
    integer,dimension(:),pointer :: num
    if (associated(num)) write (*,*) 'associated'
  end subroutine check
end module mod_test

Is this a compiler bug or what?
Thanks

18 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

In my post I didn't use the 'insert cose' feature, and so the code was not very readable. I apologize - and I fixed the original post.
I would appreciate any help with my problem,
Thanks

As no one else has bitten, I'll say that I don't know how the result of this code could be predicted. Do you mean, is it a compiler bug that it doesn't give more diagnostics? OpenMP is notorious for not checking at compile time. That's one of the reasons for the Intel Thread Checker.

|1 |I/O      |Error |1 |omp     |I/O operation at "ym2.f90":4 |"ym2.f9|"ym2.f9|
|  |data-race|      |  |parallel|conflicts with a prior I/O   |0":4   |0":4   |
|  |         |      |  |region  |operation at "ym2.f90":4     |       |       |
|1 |I/O      |Error |1 |omp     |I/O operation at "ym1.f90":4 |"ym1.f9|"ym1.f9|
|  |data-race|      |  |parallel|conflicts with a prior I/O   |0":4   |0":4   |
|  |         |      |  |region  |operation at "ym1.f90":4     |       |       |

I have been looking at this too.The OpenMP spec specifies special handling related to POINTER and ALLOCATABLE only when certain conditions do not hold, but I believe the second case should see the array pointer as associated by all threads the same as all threads see the pointer as associated in the first case. I will inquire with the developers and update the thread when I know more.

Try something like this:

program test   
  use mod_test, only: check   
  implicit none   
  integer,dimenstion(:), pointer,save :: p=>null()   
  !$omp threadprivate(p)
  ! Above is in variable declarations of program test
  ! ...
  ! initialization portion of program test
  !$omp parallel
  nullify(p) ! shouldn't be required, used for work-around
  !$omp end parallel
  ! ...
  ! compute section of program test   
  !$omp parallel   
  allocate(p(1))   
  call check(p)   
  !$omp end parallel   
end program test  

Jim Dempsey

Quoting - tim18:
"...I don't know how the result of this code could be predicted..."

To me it seems clear: there are 4 copies of the pointer p, one for each thread. Each one is being allocated and then tested for association, and it should always be associated.

Quoting - Kevin Davis (Intel):
"...I will inquire with the developers and update the thread when I know more"

Thanks!

Quoting - jimdempseyatthecove:
"Try something like this:
...
!$omp parallel
nullify(p) ! shouldn't be required, used for work-around
!$omp end parallel
..."

Thank you for your suggestion, it is true that in many occasions nullifying a pointer before using it can solve problems later on. Unfortunately, in this case this does not make any difference.

OK - Plan B

Create a user defined type for holding thread private data.

Inside this type place the pointer to allocatable array

Create an instance of the defined type for holding thread private data as thread private
OR
Create a pointer to aninstance of the defined type for holding thread private data as thread private (and allocate in each thread).

I have a Windows based Fortran program that does the latter so I know this works

Jim Dempsey

Quoting - ymost
Quoting - tim18:
"...I don't know how the result of this code could be predicted..."

To me it seems clear: there are 4 copies of the pointer p, one for each thread. Each one is being allocated and then tested for association, and it should always be associated.

I've not found any reference to tell what a threadprivate directive outside a parallel region would do . It's not clear to me that it would take effect in the next, or all following, parallel regions. Your intention may be clearer to you than to me or to the compiler. I thought maybe your idea was the compiler should tell you what (if anything) is wrong with the source code. That's why I suggested attention to the race condition diagnosed by Thread Checker.

Quoting - tim18

I've not found any reference to tell what a threadprivate directive outside a parallel region would do . It's not clear to me that it would take effect in the next, or all following, parallel regions. Your intention may be clearer to you than to me or to the compiler. I thought maybe your idea was the compiler should tell you what (if anything) is wrong with the source code. That's why I suggested attention to the race condition diagnosed by Thread Checker.

??

When you call a subroutine scalar variables are on stack (automatic vectors are on stack or descriptor on stack and data from heap)

If this subroutine is called from a parallel region, each thread having its own stack will thus have a private copy of those stack variables while all using the same symbolic name.

That I think you understand.

Now wouldn't it be nice if each thread in a multi-threaded program could have a thread private data items that share the same name in all threads but in fact reference different data (same as stack model). You may want to place temp arrays in the thread private area or some sort of context information (e.g. pointer to some object owned by the thread).

This thread private area is independent of entering or exiting !$OMP PARALLEL regions excepting for when the !$OMP PARALLEL region creates additional thread(s). When a thread is created, it gets a copy of the current state of the master threads thread private data.

Care must be taken as a copy of the thread private data from the master thread may contain allocated arrays. It may not be polite for the 2nd thread or later thread to deallocate or disturb this array if the array was intended to be a private copy for the master thread. To help get around that consider using a pointer to the array which you can NULLIFY and/or allocate.

Thread initialization of the thread private data area can be done once early in the program.
Caution, should you use nested parallel regions care must be taken for initialization of those threads private data as well.

ThreadPrivate is a compiler directive not a runtime directive.
Symbols marked with threadprivate have a little more overhead in access. The runtime system maintains a pointer to the thread private data area. The compiler auto-magicly inserts an additional dereference via this pointer for thread private data.

Experiment with thread private data as it can really help improve performance in areas where you want largethread scratch data arrays (too large for stack).

Jim Dempsey

Quoting - jimdempseyatthecove

OK - Plan B

Create a user defined type for holding thread private data.

Inside this type place the pointer to allocatable array

Create an instance of the defined type for holding thread private data as thread private
OR
Create a pointer to aninstance of the defined type for holding thread private data as thread private (and allocate in each thread).

I have a Windows based Fortran program that does the latter so I know this works

Jim Dempsey

Great idea, and it worked! I used the first option you offered, since I didn't want to risk using a threadprivate pointer again. This way I get a de-facto threadprivate pointer without the risk, and that's great.
I actually conjured up my own workaround too: I added a threadprivate integer to hold the address of the pointer after it is allocated (I extracted the address using the 'loc' function), and referred to the correct address in every thread by using the 'pointer(a,b)' mechanism. I think I'll switch to your method, though, since it's more elegant than mine, and more importantly, my method is not portable since the loc function and pointer mechanism are specific to the Intel compiler.
Thanks a lot!

Press ENTER to look up in Wiktionary or CTRL+ENTER to look up in Wikipedia

Quoting - tim18
I've not found any reference to tell what a threadprivate directive outside a parallel region would do . It's not clear to me that it would take effect in the next, or all following, parallel regions.

A threadprivate directive can only be used outside a parallel region. Quoting section 2.9.2 from the OpenMP specification, page 84, line 34:

The threadprivate directive must appear in the declaration section of a scoping unit in which the common block or variable is declared.

The declaration section can never be in a parallel region. The threadprivate directive changes the status of the variable for all consecutive parallel regions that have the same number of threads. Notice also that the first version of the program I attached works perfectly, it is only the addition of the 'dimension' attribute that causes the error, and this is probably a compiler bug.

Quoting - ymost

A threadprivate directive can only be used outside a parallel region. Quoting section 2.9.2 from the OpenMP specification, page 84, line 34:

The threadprivate directive must appear in the declaration section of a scoping unit in which the common block or variable is declared.

The declaration section can never be in a parallel region. The threadprivate directive changes the status of the variable for all consecutive parallel regions that have the same number of threads. Notice also that the first version of the program I attached works perfectly, it is only the addition of the 'dimension' attribute that causes the error, and this is probably a compiler bug.

Now that you mention it, Intel Fortran does have different rules about placement of threadprivate directives than some other compilers. This definitely annoys customers, but I think it is on account of those other compilers not following the specification about threadprivate following the last COMMON. If you can show that this is in violation of the standard, it would be a good subject for a problem report. However, I can't agree with you about no declarations in parallel regions. How would you have any subroutine calls in parallel regions under your proposed restriction? How would you implement the requirement of the specification that the threadprivate appear in each compilation unit, where applicable? Maybe you didn't mean that.
Your first version didn't work perfectly for me, as I showed.
The standard does prescribe persistence of threadprivate values between compatible parallel regions, but I don't see that it calls for threadprivate directives from outside a parallel region to apply.
If you would file a problem report on premier.intel.com, you should be able to get an answer from the Intel experts on OpenMP.

I submitted this to the compiler team earlier; they are investigating now. Our internal tracking id is: CQ# DPD200118527. I will keep the thread updated as I learn more.

tim18,

>>However, I can't agree with you about no declarations in parallel regions.

Thread private data is static data owned within the thread context
seperate from static data owned by the process

>>How would you have any subroutine calls in parallel regions under your proposed restriction?

A subroutine may be called from within a parallel region or from sequential section.
A subroutine cannot be declared from within a parallel region.

For most thread private data use the stack

Make sure you include automatic on any arrays (vectors) as you may forget an option switch and get burned with a long debug session.

For SAVE variables, yes, you need to declare these with threadprivate attribute.

Note: A subroutine declared array such as

"real :: temp(10)" may or may not be stored on the stack.
You CANNOT tell by looking at the source code.

"real, automatic :: temp(10)" will be on the stack
or at least the array descriptor will be on the stack
(when all automatic arrays are heap arrays)

"real, allocatable :: temp(:)" the array descriptor may or may not be stored on the stack.
You CANNOT tell by looking at the source code.

"real, automatic, allocatable :: temp(:)" the array descriptor will be stored on the stack.

Jim Dempsey

Quoting - tim18
...However, I can't agree with you about no declarations in parallel regions. How would you have any subroutine calls in parallel regions under your proposed restriction? How would you implement the requirement of the specification that the threadprivate appear in each compilation unit, where applicable? Maybe you didn't mean that.
Your first version didn't work perfectly for me, as I showed.
The standard does prescribe persistence of threadprivate values between compatible parallel regions, but I don't see that it calls for threadprivate directives from outside a parallel region to apply.

Indeed my statement was inaccurate. I should have said a threadprivate directive cannot appear inside the lexical extent of a parallel region, i.e. it can appear in a subroutine called from a parallel region. Still, the threadprivate directive must always appear in the declaration section of any scoping unit, and it can (and often does) appear outside any parallel region at all. It is used exactly in this way in the examples given in the OpenMP specification itself. Also, a quick example can show that it cannot be used inside the lexical extent of a parallel region:

program test
  implicit none
  integer,save :: num
  !$omp threadprivate(num)
  !$omp parallel

  ! --- code ---

  !$omp end parallel
end program test

This code works fine, with the expected behaviour of the variable num having a separate copy for each thread. However:

program test
  implicit none
  integer,save :: num
  !$omp parallel
  !$omp threadprivate(num)

  ! --- code ---

  !$omp end parallel
end program test

This code produces a compilation error: "error #6236: A specification statement cannot appear in the executable section."

Consider the !$omp threadprivate(num) as a data declaration statement that is a band-aid for a missing Fortran keyword. i.e. Fortran should have "integer, save, threadprivate :: num" (or threadprivate could implicitly include save and "integer,threadprivate :: num" would suffice).

Jim Dempsey

Best Reply

Quoting - Kevin Davis (Intel)

I submitted this to the compiler team earlier; they are investigating now. Our internal tracking id is: CQ# DPD200118527. I will keep the thread updated as I learn more.

Development indicates for case 2 (pointer declaration with dimension(:) attribute) the compiler generates incorrect accesses to the thread-private variable, p. They indicate this only manifests itself when the legacy implementation of thread-private variables is used, and that specifying compatibility mode (by using option openmp-threadprivate compat) produces correct code.

They still plan to fix the issue in a future release; however, offered openmp-threadprivate compat as a work around.

Quoting - Kevin Davis (Intel)

Development indicates for case 2 (pointer declaration with dimension(:) attribute) the compiler generates incorrect accesses to the thread-private variable, p. They indicate this only manifests itself when the legacy implementation of thread-private variables is used, and that specifying compatibility mode (by using option openmp-threadprivate compat) produces correct code.

They still plan to fix the issue in a future release; however, offered openmp-threadprivate compat as a work around.

Thank you, the '-openmp-threadprivate compat' flag solves the problem!
However, it reveals another bug which is manifested in a different section of my code. I have posted it in a new discussion.

Leave a Comment

Please sign in to add a comment. Not a member? Join today