openMP, threadprivate and allocate question

openMP, threadprivate and allocate question

Hi,

Can someone tell me why the following code below doesn't work when compiled under omp? From the debugger the error is given as:

forrtl: severe (408): fort: (10): Subscript #3 of the array BIGA has value 1 which is greater than the upper bound of -1

If I move the initA() call into the parallel section, then the code works fine. While this is trivial to do in this example, it is not so easy to do in the actuall code this example simulates. I've used this type of coding before without problems, but with a larger program (or perhaps with version 11 of IVF?), I'm having problems.

Does anyone have suggestions regading how to handle large global arrays that need to be threadprivate?

module globalVars
    implicit none
    integer na
    real(8), allocatable:: bigA(:,:,:)
    !$omp threadprivate(bigA)   
    contains
    subroutine initA ()
        allocate( bigA(na,na,na) )
    end subroutine initA
end module globalVars

program testomp
    use globalVars
    implicit none
    integer i,j,k
    
    na = 500
    call initA ()
    !$omp parallel
    !$omp do
    do i=1,na
        do j=1,na
            do k=1,na
                bigA(i,j,k) = real(i)*real(j)*real(k)
            end do
        end do
    end do
    !$omp end do
    pause
    !$omp end parallel
    stop
end program testomp

thanks!!
-joe

8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - Joe

module globalVars
    implicit none
    integer na
    real(8), allocatable:: bigA(:,:,:)
    !$omp threadprivate(bigA)   
    contains
    subroutine initA ()
        allocate( bigA(na,na,na) )
    end subroutine initA
end module globalVars

program testomp
    use globalVars
    implicit none
    integer i,j,k
    
    na = 500
    call initA ()
    !$omp parallel
    !$omp do
    do i=1,na
        do j=1,na
            do k=1,na
                bigA(i,j,k) = real(i)*real(j)*real(k)
            end do
        end do
    end do
    !$omp end do
   

We've had discussions before on the forum about whether threadprivate ought to work outside a parallel region. Not having anything new to say on that issue, I'll move on, and point out that a compiler can't optimize loop nesting under $omp do, so you're pretty much obligated to put the correct loop in position as the parallelized DO loop. Also, there is no need for !$omp end do, as the Fortran end do ends the omp do inherently.

Quoting - tim18

We've had discussions before on the forum about whether threadprivate ought to work outside a parallel region. Not having anything new to say on that issue, I'll move on, and point out that a compiler can't optimize loop nesting under $omp do, so you're pretty much obligated to put the correct loop in position as the parallelized DO loop. Also, there is no need for !$omp end do, as the Fortran end do ends the omp do inherently.

Thanks Tim,

The loop nesting is just an artifact of the example I was toying with, and is not a problem in the real code. In previous codes I used threadprivate outside of the // region without problems, so just assumed it to be an ok construct, but it sounds like it is not.

Unless I hear from others, I suppose I will attempt to rewrite code so threadprivate occurs within the // omp sections. However, as i'm hacking old code to get parts of it to run in //, it will not be straight forward.

cheers,
-joe

Joe,

Shuffle two lines of code

    !$omp parallel   
    call initA ()  ! perform init inside parallel region 
    !$omp do  

The above will (should) work in the code sample you gave. However, there is a caviat. OpenMP has a feature to enable dynamically allocatable threads (as opposed to fixed thread pool from initialization at start of program). Should you perform the allocation(s) then spawn new threads, an additional allocation will need to be performed.

Also note, using fixed thread pool, the above suggestion performs the call to initA() for only those threads participating in the above parallel region.

    !$omp parallel num_threads(2)
    call initA ()  ! perform init inside parallel region 
    !$omp do 
    ...
    !$omp end do
    !$omp end parallel
    ...
    !$omp parallel num_threads(4)
    !$omp do
    ...



 
In the first parallel region only 2 of the OpenMP thread pool threads
perform the initA, In the second parallel region only those two threads
will have allocated arrays, the other two will not.
Jim Dempsey
www.quickthreadprogramming.com

Jim,

Yes, i did notice that moving the iniA() into the // region did allow the code to execute correctly. This is currently what I am trying to do with the actual code that is causing stack overflow problem that I posted about in a separate, but related, thread.

As I was trying to tease that code appart to reveal the stack overflow problem, the problem posted here revealed itself.

Other than the problem you mentioned about possible thread initialization issues, is the coding approach compliant with openMP? I'm hoping that once I move allocations to within // regions, my stack problems will also go away.

thanks again!
-joe

Joe,

The code samples as written (with proper placement of call to initA) is not taking into consideration the 2nd call. i.e. there was no corresponding uninitA, or test to see if A already inited and/or initedto proper size and if not take appropriate action. I am sure your actual code takes this into consideration.

Finding a stack overflow is problematic as often is the casethe error trashes traceback information. You may need to install code to help trace the execution. Then once the subroutine is found, have the trace code set a trigger for your break point (could be something odd like the Nth iteration following Plhatpp...) After a few crashes, you will be able to figure out how to set your trigger. And the next run you will be able to step into the problem.

I have been bugging Intel for years now, to no effect, to add a compiler diagnostic option that will issue a warning when the compiler generates code that will (or may depending on circumstances) generate a temporary array.

For some reason, they keep thinking that their run-time warning message when array temporary is generated is equivalent. Well it is not. Consider that you may crash due to stack overflow _prior_ to getting the information report. And consider that the array temporary creation may be dependent on data placement. And data placement sensitivity usually ends up meaning it won't break during testing but will break in production. (Standard Murphy's Law applies here). Having a compiler warning of (possibility) of array temporary creation will help you catch the potential problem _before_ it happens.

Jim Dempsey

www.quickthreadprogramming.com

Great, thank you Jim.

BTW, your previous reply turned out to be quite useful. After I got rid of at least some of the stack problems, while stepping through code I found only the master thread was making it w/o causing problems, while workers caused acess violations. I'm pretty sure the problem is that the worker threads didn't allocate some arrays that the master did, as you indicated could be a problem.

I'm currently going through code to make sure that all threads have access to allocated arrays, which I'm guessing will fix this current problem.

You seem to be one step ahead of me, so I will consider your last post in more detail once the next problem arises :)

cheers,
-joe

Would it be possible to declare the ALLOCATABLE arrays as THREADPRIVATE and then
ALLOCATE them in a separate PARALLEL region. for example:
!dec$ OMP THREADPRIVATE(A) 

!$OMP PARALLEL
ALLOCATE(A)
!$OMP END PARALLEL

then later

!$OMP PARALLEL COPYIN(A)
call work(A)
!$OMP END PARALLEL 

and so all the threads get a copy of the array, intialized with copyin

 

Leave a Comment

Please sign in to add a comment. Not a member? Join today