Program with simple coarrays hangs

Program with simple coarrays hangs

Hello,

in my quest to find a solution for the problem I posted in the thread "Internal compiler error with lock/unlock", I stumbled upon another problem. Again using Intel Fortran 12.0.3.

Here is the program:

! checkscalar.f90 --

! Check some odd (erroneous) behaviour with scalar coarrays

!

program checkscalar

implicit none

logical, codimension[*] :: new_results

logical, codimension[*] :: ready

ready = .false.

new_results = .true. ! Indicates the image has results available

write(*,*) 'Image', this_image(), new_results

sync all

write(*,*) 'Image2', this_image(), new_results

!

! Collect the found primes in image 1, create new tasks

! for all images

!

do while ( .not. ready )

if ( this_image() == 1 ) then

call collect_results

endif

call sleepqq(1)

enddo

contains

!

! Subroutine to collect the results from all

! images (run by image 1)

!

subroutine collect_results

integer :: i

integer :: np

integer :: maxindex

do i = 1,num_images()

write(*,*) 'Examine', i, new_results[i]

enddo

do i = 1,num_images()

ready[i] = .true.

enddo

end subroutine collect_results

end program checkscalar

It does not do much: in image 1 I print the value of new_results and when the
loop is finished I set the flag ready in all images so that they will stop. At least
that is my intention.

The output of one run with this program is:

Image 6 T

Image 1 T

Image 3 T

Image2 3 T

Image 7 T

Image2 7 T

Image2 6 T

Image 5 T

Image 2 T

Image2 2 T

Image 4 T

Image 8 T

Image2 4 T

Image2 1 T

Examine 1 T

Image2 8 T

Image2 5 T

(after a few seconds of no progress at all, I stopped the program)

So, all 8 images start, image 1 is entering the loop, prints the value of new_results on that
image, and then hangs - it does not get beyond this!

Does anyone have any clues? Am I doing something wrong? (Possible, of course, but the
program is so simple that I can not believe that.)

Regards,

Arjen

26 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I know little about coarrays, but this code segment made me suspicious:

do while ( .not. ready )
  if ( this_image() == 1 ) then
     call collect_results
  endif
  call sleepqq(1)
enddo

If the loop is entered and this_image() returns a value different from 1, the loop will do nothing but consume time endlessly. What event would cause the IF block to be executed in this case?

With a dual-core CPU, I get the output

 Image           1 T
 Image           2 T
 Image2           1 T
 Examine           1 T
 Image2           2 T

and then the program gets stuck in the DO WHILE loop. The last line printed has this_image() equal to 2, so the loop will keep calling SLEEPQQ and do nothing else.

The function this_image() returns the unique number belonging to the image.

The intention of the above fragment is to have image 1 (or thread 1 or ...) examine the
results produced by all images. In the reduced program I posted there are no actual results,
but the variable that indicates there are is set to .true. at the start. The only task to be done
in the routine collect_results is to loop over the images, print the value of new_results on that
image and then to set the variable ready in all images so that the do-while loop stops.

In short:
All images except one should wait for image 1 to set "ready" and then terminate the loop
and stop the program altogether.

Unfortunately, image 1 seems to get stuck reading the value of new_result on image 2
and then the program hangs.

Regards,

Arjen

Regardless of the images /= 1 making itout ofthe loop, image 1 should have been able to run through the "Examine" part for all images. Arjen is pointing out that image 1 is hanging when it should not be hanging.

Jim Dempsey

www.quickthreadprogramming.com

No news on this issue?

Regards,

Arjen

Arjen,

Sorry I cannot help further as I do not currently have PS 2011 XE, but I hope to have it shortly (a few weeks). At that point I should be able to run your example program (although not on Mac). Could you describe your test environment: processor/processors, single system/multi-system, connection, OS/OSs, bitness (32/64) version of IVF (Fortran Composer). This would help me in trying to replicate your problem.

Jim Dempsey

www.quickthreadprogramming.com

Arjen,

You should not use things such as SLEEPQQ in coarray applications. Use the language-provided synchronization features instead. For example:

! checkscalar.f90 --
! Check some odd (erroneous) behaviour with scalar coarrays
!
program checkscalar
implicit none
logical, codimension[*] :: new_results
logical, codimension[*] :: ready
ready = .false.
new_results = .true. ! Indicates the image has results available
write(*,*) 'Image', this_image(), new_results
sync all
write(*,*) 'Image2', this_image(), new_results
!
! Collect the found primes in image 1, create new tasks
! for all images
!
sync all
if ( this_image() == 1 ) then
call collect_results
endif

contains
!
! Subroutine to collect the results from all
! images (run by image 1)
!
subroutine collect_results
integer :: i
integer :: np
integer :: maxindex
 do i = 1,num_images()
 write(*,*) 'Examine', i, new_results[i]
 enddo

end subroutine collect_results
end program checkscalar

In particular, you can't count on the local "ready" being updated unless there is a synchronization point. That is probably what is doing you in.

Steve - Intel Developer Support

Hi Jim,

I run this program on a 64-bits Linux machine with 8 cores.
I am using the free version of Intel Fortran for Linux, version number 12.0.3.

Regards,

Arjen

Hi Steve,

thanks for these comments - still learning my way around coarrays. I will experiment with this.

Regards,

Arjen

Arjen,

>>I run this program on a 64-bits Linux machine with 8 cores

Other than for a learning experience with coarrays there is little reasons for using coarrays on such a system - OpenMP would provide for better performance and ease in programming. The only advantage of using coarrays on that system is the static data size could be larger due to multiple process space and the local data (stack and local process allocatables) are acquired from different process virtual address space. And lack of space for local data would only be a concern if you were running 32-bit applications.

Jim Dempsey

www.quickthreadprogramming.com

While I'll agree that you'd get better performance nowadays with OpenMP on such a system, there is a benefit to using coarrays in that the language rules are simpler and the program will scale to clusters without coding changes. I don't agree that OpenMP is easier, but coarrays definitely have a different model and you have to get your head around that first.

Steve - Intel Developer Support

Ammending my last post. Coarrays potentially make sense on your system (assumption on my part)if you have the new Intel Many Integrated Core (e.g. Knights Ferry) and if coarray programmingis more efficent than alternative means.

Jim Dempsey

www.quickthreadprogramming.com

The main reason for experimenting with coarrays is indeed understanding the programming model.

I tried Steve's version of my program and that works fine. However, when I introduced a loop with a "sync all" statement, only the first thread finishes - the ready variable seems not to be updated on any other image.

Here is the program:

! checkscalar.f90 --

! Check some odd (erroneous) behaviour with scalar coarrays

!

program checkscalar

implicit none

logical, codimension[*] :: new_results

logical, codimension[*] :: ready

ready = .false.

new_results = .true. ! Indicates the image has results available

write(*,*) 'Image', this_image(), new_results

sync all

write(*,*) 'Image2', this_image(), new_results

!

! Collect the found primes in image 1, create new tasks

! for all images

!

do while ( .not. ready )

sync all

if ( this_image() == 1 ) then

call collect_results

endif

sync all

enddo

write(*,*) 'Image ',this_image(), ' done'

contains

!

! Subroutine to collect the results from all

! images (run by image 1)

!

subroutine collect_results

integer :: i

integer :: np

integer :: maxindex

do i = 1,num_images()

write(*,*) 'Examine', i, new_results[i]

enddo

do i = 1,num_images()

ready[i] = .true.

enddo

end subroutine collect_results

end program checkscalar

What I see in the output is that image 1 finishes after examining all images and the others
never produce the message "Image n done", despite the "sync all" statements.

Regards,

Arjen

Found the solution!

I replaced the sync all statements in the do while loop by sync images statements:

do while ( .not. ready )

if ( this_image() == 1 ) then

call collect_results

sync images( * )

else

sync images( 1 )

endif

enddo

and then the program finishes nicely. Now with this solution I can continue my experiments
with the actual program(s).

Regards,

Arjen

Arjen,

A potential error (not your error) is if the compiler optimization stripped the second sync all in your do loop (i.e. sees one at top and bottom and assumes redundant). If this is the case then potentially the sync all following image 1's collect_results is never called.

Two things to try

a) place sync all after do loop
b) remove first sync all in do loop (forcing sync all to follow collect_results)

Wouldn't hurt to test both scenarios as you want to discover what is going on as opposed to simply getting the code to work (i.e. don't stop testing if first test succeeds).

Jim Dempsey

www.quickthreadprogramming.com

The optimizer would never remove SYNC statements.

Rather than use a covariable "ready" and testing it in a loop, use the synchronization tools provided by the language, including locks, critical sections and the various forms of SYNC. It would be an unusual coarray application that needed to use a loop for this.

The biggest danger of learning coarrays is assuming you can simply translate concepts from shared-memory programming in the past. If you're an MPI programmer, however, it may be an easier transition as coarrays sort of look like one-way MPI.

Steve - Intel Developer Support

I tried these things and I tried "sync all( stat = istat )" (with a write statement) to convince
the compiler that this statement is required, but the result was still the same.

Adding a write statement here and there does clarify _what_ is happening, but not why.
Image 1 leaves the loop, but all other remain in the loop. It definitely looks if the "ready"
coarray is not updated. With the "sync image" statement it is. (Actually got my original
program to work that way)

Regards,

Arjen

There's an open bug report "Write to covariable in other image is not reflected in other image's local copy" which I think is the same as your issue. To see if that's true, try building with -Od and see if that changes the behavior.

Steve - Intel Developer Support

I tried your program with optimization and it seemed to work ok. Another thing to try is to test ready[this_image()] which also avoids the bug.

Steve - Intel Developer Support

With the statement "do while ( .not. ready[this_image()] )" it does work,and turning offoptimisation with
-O0 works too.

Well, that is good to know!

Regards,

Arjen

At least you have a reasonable work around to get you past this issue.
When I introduce work arounds I also insert a comment with a common unique signature

! **hack**
! using "( .not. ready[this_image()] )" as opposed to "(.not. ready)"
! due to compiler bug

Then later on, as I get new versions of the compiler, I can locate all the hacks and test to see if bug fixed.
Also, when fixed, I leave the code in but conditionalized out. You never know if a bug resurfaces.

This would seem to indicate that sync all is broken, or at least appears broken under this circumstance.

Jim Dempsey

www.quickthreadprogramming.com

It is not that sync all is broken, but for some reason a refernece to the local variable isn't getting the updated value. Issue ID is DPD200168369.

Steve - Intel Developer Support

Quoting Steve Lionel (Intel)It is not that sync all is broken, but for some reason a refernece to the local variable isn't getting the updated value. Issue ID is DPD200168369.

?? isn't that the purpose of sync all ??
Jim

www.quickthreadprogramming.com

Well, no - it is also a barrier to continued execution until all images reach that point.

Steve - Intel Developer Support

(toung in cheek)

Then it appears that one should always use YourCoArrayVariable[yourImageNumber] instead of YourCoArrayVariable whenever some other image could write to YourCoArrayVariable (because sync all, although reportedly not broken, does not update local image of YourCoArrayVariable).

(toung back where it belongs)

There may be a race condition in an optimization whereby you omit poling other processes if you "know" no updates to your shared variable(s) were made. (error in code relating to condition variables on Linux). Might be a place to look after you get a reproducer.

Jim Dempsey

www.quickthreadprogramming.com

The problem in the initial post got fixed somewhere along the way - it works now in the current compiler, Composer XE 2011 Update 11.

Steve - Intel Developer Support

Leave a Comment

Please sign in to add a comment. Not a member? Join today