Thread safety when passing a procedure

Thread safety when passing a procedure

When passing a contained function the variables in the parents scope are not thread safe in 12.1 and earlier. Was this fixed in 13 ?

19 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Got an example? Parent declared RECURSIVE? I also suggest reading Doctor Fortran in "Think, Thank, Thunk"

Steve - Intel Developer Support

Here is an example I threw together just to demonstrate. Run it in debug mode and set up the project for openMP. It usually gives a runtime error due to array already allocated. I had great difficulty pasting source as it added and removed line feeds and spaces seamingly at random. There is no preview button either. I havent had time to try it on XE2013 yet but I see in the release notes that passing internal procedures is supported in this version.

module MyMod
  implicit none
  type Thing
     real(8), allocatable :: someNumbers(:)
  end type
  contains
subroutine anotherRoutine(f)
  interface
     subroutine f()
     end subroutine
  end interface
  call f
end subroutine
subroutine doSomethingWithThis(a)
  type(Thing), intent(inout) :: a
  call anotherRoutine(reallyDoSomethingWithThis)
  deallocate(a%someNumbers)
contains
  subroutine reallyDoSomethingWithThis()
     use ifport
     integer j, n
      n = int(rand()*1000000.0)
     allocate(a%someNumbers(n))
     do j = 1, n
         a%someNumbers(j) = log(dble(j))
     end do
  end subroutine
end subroutine
end module
program Test
  use MyMod
  type(Thing) a(100)
  !$OMP PARALLEL
  !$OMP SINGLE
  do j = 1, 1000
     do i = 1, 100
        print *, j, i
        !$OMP TASK
        call doSomethingWithThis(a(i))
        !$OMP END TASK
     end do
  end do
  !$OMP TASKWAIT
  !$OMP END SINGLE
  !$OMP END PARALLEL
end program 

Ok, I will try this tomorrow. We have supported this for more than a decade.

Steve - Intel Developer Support

I don't think this is related to internal procedures (you can take them out of the picture and get the same result).  That's because I think there's one or more data races.

One unlucky thread executes the single construct, other thirstier threads head straight for the bar ier (implicit at the end of the single construct) where they sit around and wait for some tasks to be created.  In that unlucky single thread, when j in the main program is 1, a task is created that operates on a(1) and other elements of a.  When j is 2, a task is created that operates on a(1) and the other elements of a.  I think a is shared ("in a task construct, if no default clause is present, a variable that in the enclosing context is determined to be shared by all implicit tasks bound to the current team is shared", and the single construct just follows whatever the enclosing parallel construct reckons the data sharing attributes are in the absence of directions to the contrary).  Once the tasks have been created there is no synchronisation, so consequently, I think there is a data race.

(I think your taskwait is redundant - everyone's already waiting at the end of the single.)

That said, I'm not convinced that something isn't awry with the implementation of tasks, but I don't think this demonstrates it.  But I say "I think" a lot above, because .. I know ... my knowledge of OpenMP is inadequate.  If you think otherwise I'd like to hear about it.

My actual code does not have the outer loop. That is an unfortunate new problem I created in my test code. I am having trouble replicating the error in test code. I will report if I get anywhere.

Steve, whilst I agrea your complier has accepted passing internal procedures as arguments for many years, I dissagrea that it has always worked in a multi-threading environment. I had a support request in 2009 (545221) where your engineer stated that you did not support use in multi-threading. That suppoprt request was for IVF 11.0.074.

Andrew, I reviewed the old issue you mentioned and I see no reference at all to passing contained procedures not being thread-safe.

We have supported passing contained procedures as arguments since CVF 6.1. I didn't make a claim about thread safety earlier, but if the program is compiled with /Qopenmp so that /auto is enabled, then the compiler will pass the "thunk" in a thread-safe manner. It may even do it all the time.

Steve - Intel Developer Support

I decided to run your test program through our analysis tools.  First, Static Analysis, part of Fortran Studio XE or Parallel Studio XE, said:

U364462.f90(34): error #12208: variable "I" must be SHARED in the enclosing context since it is specified in a FIRSTPRIVATE clause at (file:U364462.f90 line:38)

I then ran it through Intel Inspector XE's threading analysis and it found a data race, as Ian suggested.  I am not an OpenMP expert so I defer to others here, such as Ian, who know more about it than I do. I can tell you, having studied the assembler code, that the passed routine thunk is done in what appears to be a thread-safe manner.

Steve - Intel Developer Support

Thanks for looking into it Steve. I already mentioned the outer loop was an error on my part so I would expect issues with the inner loop variable i that you found.

I would be grateful if you could look again at the source code below from the original support request . It passes a contained precedure in an OpenMP environment. The contained procedure compares the thread ID number with the thread ID recorded by its parent precedure in a local variable that should be local to the parent and contained procedure and therefore the thread ID's should match. In IVF 11 we were seeing different ID's and data races reading the recorded thread ID from the local variable. I just tried it in XE2013 and it has the same issue. I post it here. Perhaps I have made some mistake that others can point out.  (I editted this post again to fix the source formatting)

module someFairlyOrdinaryCode
implicit none
contains
subroutine integrator(k, answer)
   use omp_lib
   real, intent(in) :: k
   real, intent(out) :: answer
   integer threadID
   threadID = OMP_Get_Thread_NUM()
   call integrateAnyFunction(f, 0.0, 1.0, answer)
contains
   !A simple linear function y = kx
   real functionf(x)
      real, intent(in) :: x
      integer threadIDContained
      threadIDContained = OMP_Get_Thread_NUM()
      !Thread ID should match else some bad things happen since otherwise we may or may not
      !have any stack variables or we might have wrong ones
      if (threadIDContained /= threadID) then
         !$OMP CRITICAL
         print *, 'My parent was called by thread ', threadID
         print *, 'But I am actually thread       ', threadIDContained
         stop
        !$OMP END CRITICAL
      end if
      f = k*x
   end function
end subroutine
!Basic trapezoidal rule
subroutine integrateAnyFunction(f, a, b, answer)
   interface    
      real functionf(x)
         real, intent(in) :: x
      end function
   end interface
   real, intent(in) :: a, b
   real, intent(out) :: answer
   real h, x1, x2
   integer i
   answer = 0.0
   h = (b - a)/9
   do i = 1, 10
      x1 = a + h*(i-1)
      x2 = x1 + h
      answer = answer + (f(x1) + f(x2))*h/2
   end do
end subroutine
end module
program TestOpenMPThatTookAYearToFind
use someFairlyOrdinaryCode
integer numTimes
real, allocatable:: k(:), answer(:)
integer i, j
do j = 1, 2000
   numTimes = int(rand(0)*20.0) + 1
   allocate(k(numTimes), answer(numTimes))
   do i = 1, numTimes
         k(i) = i
   end do
   !$OMP PARALLEL DO
   do i = 1, numTimes
      call integrator(k(i), answer(i))
   end do
   !$OMP END PARALLEL DO
   deallocate(k, answer)
end do
end program

Intel Inspector XE tells me there is a data race at the call to IntegrateAnyFunction. There are some hints that it may indeed be an issue with the contained function passing. I will see if I can puzzle this out.

Steve - Intel Developer Support

Steve - you would be deferring to someone who's OpenMP experience is best measured in minutes.

I asked a question on the openmp forum a few days ago that might be relevant to Andrew's latest example - this might not be a question of thread safety per-se, more about data sharing rules.  No answer yet.  The OpenMP spec doesn't support F2003 let alone F2008, so passing internal procedures around is not something that would have been envisaged - so it won't cover this case.  But beyond that, the question is whether the access to threadID in the internal procedure is an access to the "shared" instance of threadID or a "private" copy.  The data sharing attribute rules (in 3.1 at least, I've only skimmed the 4.0 draft) don't cover this case, inside the local procedure threadID is not a local variable, so the "..are private" bit of page 87, line 26 of OpenMP 3.1 doesn't apply. 

Perhaps this is explicitly unspecified (page 96 line 17), in which case the compiler can do what it wants. But it strikes me more as an oversight, or at least something that needs to be explicitly called out as "unspecified - so don't use host association and internal procedures ever".

If the OpenMP spec said it was private, then the code is ok and the compiler has an issue.  Alternatively, if spec said it was shared, then the code is non-conforming - shared threadID is undefined.  I can see reasons for it to go either way.

Back to the original example - data race aside I think the Inspector message about "i must be shared in the enclosing context" is bogus.  Page 99 line 12 makes it pretty clear that a task can firstprivate something that is private in the encountering task.

My OpenMP task query attached, though now I think it might just be more of this.

Allegati: 

AllegatoDimensione
Download 2013-02-01-chasetherabbits.f9011.45 KB

I wrote a lengthy post which appears to have evaporated.  Brutal summary - is the host associated threadID inside the internal procedure shared or private?  The OpenMP spec doesn't appear to say., or perhaps says it is unspecified.  If its shared, you've got a problem.

Regardless of what may or may not be in the standard, I am sure that host associated local variables should be private to the calling thread just like they are in the host itself. But it seams like the incorrect instance of the host associated local variable is given to the internal procedure. The only real benefit in having internal procedures is host associated variables. Steve I hope this will now be targeted for a fix. It did not seam like my original support request was fully understood at the time.

I've been trying to understand what is supposed to happen here. From what I can tell, it ought to work, but the code that is constructing the "bound procedure value" is not doing so on the threadlocal stack. Or something... It's complicated and may take me a while to untangle. If it is a bug, we'll fix it.

Steve - Intel Developer Support

Perhaps this is tangential, but assuming OpenMP leaves things unspecified, consider:

SUBROUTINE external
  IMPLICIT NONE
  INTEGER :: host_associated
  host_associated = 0
  !$OMP PARALLEL DEFAULT(NONE) PRIVATE(host_associated)
  !$OMP CRITICAL
  host_associated = 1
  CALL internal
  !$OMP END CRITICAL
  !$OMP END PARALLEL
CONTAINS
  SUBROUTINE internal
    PRINT *, host_associated   ! shared or private?
  END SUBROUTINE internal
END SUBROUTINE external

If ifort decides host associated things are private in its implementation - that's great.  If not, the above is bad.  Then consider:

SUBROUTINE external
  IMPLICIT NONE
  INTEGER :: host_associated
  host_associated = 0
  !$OMP PARALLEL DEFAULT(NONE) SHARED(host_associated)
  !$OMP CRITICAL
  host_associated = 1
  CALL internal
  !$OMP END CRITICAL
  !$OMP END PARALLEL
CONTAINS
  SUBROUTINE internal
    PRINT *, host_associated   ! shared or private?
  END SUBROUTINE internal
END SUBROUTINE external

If ifort decides that host associated things are shared in its implementation - that's great.  If not, the above is bad. 

Either way, someone's not happy.

You will have to wait for Steve's investigation as to issues of SHARED and PRIVATE host associated.

In abstract terms, the CONTAINS subroutine has no decorations as to if it is called from within a parallel region or from outside a parallel region. Although the compiler can make this determination. Consider what happens in IanH's last post were you to use PRIVATE(host_associated), have one "CALL internal" inside the parallel region and another  "CALL internal" outside the parallel region. This adds confusion to the mix.

Note, the compiler could construct something analogous to C++'s this pointer, but consider FORTRAN may have nested host associated context, each potentially inside and/or outside parallel regions. FORTRAN may need this, that, theOther,... hidden pointers.

Jim Dempsey

www.quickthreadprogramming.com

I was not intending to focus on host association, but rather whether passing a contained procedure properly set up the environment in the bound procedure value. But I'll also research the other question separately.

Steve - Intel Developer Support

Is there any progress resolving this serious bug? I note there are many similar reports on this forum of problems with passing contained procedures in parallel code.

Sorry, not yet....

Steve - Intel Developer Support

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi