Coarray and SYNC MEMORY statement

Coarray and SYNC MEMORY statement

Beyond the "Coarrays in the next Fortran Standard" document by Reid et al, is there a thorough treatment of the subject/documentation available anywhere? I am trying to graps some of the subtle(r) aspects of coarray.

For instance, I am puzzled by the behavior of the simple code below. When the SYNC MEMORY statement is commented out, the code runs until completion; but I will note that all tasks do no start in parallel - tasks assigned to images other than 1 do not start until image 1 has completed its task. When the SYNC MEMORY statement in not commented out, the code behavior is the desired one (all images start doing their tasks in parallel), but then the code hangs after the last task complete.

I am sure the reason for this must be obvious to those more versed in coarray but it still escapes me :) .

Thanks in advance for your help,
Olivier

 

PROGRAM
MAIN
USE
, INTRINSIC :: ISO_FORTRAN_ENV, ONLY: LOCK_TYPE
IMPLICIT NONE
INTEGER
:: I,J,TASK
INTEGER
:: STACK_SIZE[*]
INTEGER
,ALLOCATABLE:: TASK_STACK(:)[:]
LOGICAL
:: KEEP_WORKING
REAL(8)
:: R
TYPE
(LOCK_TYPE) :: STACK_LOCK[*]
ALLOCATE
(TASK_STACK(10)[*])
IF
(THIS_IMAGE()==1)
THEN

DOI=1,SIZE(TASK_STACK)
TASK_STACK(I) = I

END DO
STACK_SIZE =
SIZE(TASK_STACK)

WRITE(*,*) 'There will be', NUM_IMAGES(), ' images executing the tasks.'
END IF
SYNC ALL
KEEP_WORKING = .TRUE.
DO
WHILE(KEEP_WORKING)
LOCK(STACK_LOCK[1])

IF (STACK_SIZE[1]>0) THEN
TASK = TASK_STACK(STACK_SIZE[1])[1]
STACK_SIZE[1] = STACK_SIZE[1]-1
KEEP_WORKING = .TRUE.

ELSE
KEEP_WORKING = .FALSE.

END IF

UNLOCK(STACK_LOCK[1])

!SYNC MEMORY

IF (KEEP_WORKING) THEN

WRITE(*,*)'I am image ',THIS_IMAGE(),' and I am doing task ',TASK,'.'
R = 0.0D+0

DOI=1,8000

DOJ=1,8000
R = R+
COS(REAL(I,KIND=8))+SIN(REAL(J,KIND=8)+R)

END DO

END DO

WRITE(*,*)'I am image ',THIS_IMAGE(),' and I am done: ',R

END IF
END DO
SYNC ALL
END
PROGRAM MAIN

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

There is the Fortran 2008 standard at http://j3-fortran.org/doc/standing/links/007.pdf

We found that our implementation of SYNC MEMORY was not correct and have fixed it for a release later this year. With the SYNC MEMORY enabled, I get the following output which I think is what you want:

 There will be           8  images executing the tasks.
 I am image            1  and I am doing task           10 .
 I am image            1  and I am done:   -21267699.5410634
 I am image            1  and I am doing task            8 .
 I am image            6  and I am doing task            9 .
 I am image            6  and I am done:   -21267699.5410634
 I am image            1  and I am done:   -21267699.5410634
 I am image            1  and I am doing task            6 .
 I am image            4  and I am doing task            7 .
 I am image            4  and I am done:   -21267699.5410634
 I am image            1  and I am done:   -21267699.5410634
 I am image            1  and I am doing task            4 .
 I am image            3  and I am doing task            5 .
 I am image            1  and I am done:   -21267699.5410634
 I am image            3  and I am done:   -21267699.5410634
 I am image            1  and I am doing task            3 .
 I am image            1  and I am done:   -21267699.5410634
 I am image            1  and I am doing task            2 .
 I am image            1  and I am done:   -21267699.5410634
 I am image            5  and I am doing task            1 .
 I am image            5  and I am done:   -21267699.5410634

You can sign up for the beta of this version - see the sticky thread at the top of this forum.

Steve - Intel Developer Support

Steve,

Your prompt answer is greatly appreciated - I cannot commend you enough on how nicely and efficiently you and your colleagues support this forum! Thank you so much.

It seems from the example you posted that you have the same behavior as when I comment out the SYNC MEMORY statement. Actually, I believe the SYNC MEMORY statement is not required in this example. I note that for instance, image 6 (or the second task assigned to image 1) does not start until the first task assigned to image 1 is complete. I do not understand this behavior.

Ideally, since you have 8 cores, shouldn't you have (since the tasks take some time to complete) an output that looks like:

I am image 1 and I am doing task...
I am image 2 and I am doing task ...
I am image 3 and I am doing task ...

before any of the task completion messages? In your example just like in mine, the other images do not start until image 1 is done with the first task. Hence my puzzlement :)

Olivier

Yes, there does seem to be something odd going on here. I will look at it more closely.

Steve - Intel Developer Support

Steve - did you get a chance to take a look at this a bit further?

Also - the documention on what SYNC MEMORY does is a bit obscure. Does SYNC MEMORY only make sure that the image that encounters it 'exposes' its own coarrays; or when one image encounters it, it requires all images to expose their own coarrays (knowing that they might be busy doing other things, which may slow down the execution of SYNC MEMORY).

Coarrays seem to have significant potential for code simplification - I wish there were more examples available to get started.

Thanks,
Olivier

I did not have a chance to investigate this further. Here is what the standard says:

2 1 Execution of a SYNC MEMORY statement ends one segment and begins another; those two segments can be
3 ordered by a user-defined way with respect to segments on other images.
4 R862 sync-memory-stmt is SYNC MEMORY [ ( [ sync-stat -list ] ) ]
5 2 If, by execution of statements on image P,
6 - a variable X on image Q is defined, referenced, becomes undefined, or has its allocation status, pointer
7 association status, array bounds, dynamic type, or type parameters changed or inquired about by execution
8 of a statement,
9 - that statement precedes a successful execution of a SYNC MEMORY statement, and
10 - a variable Y on image Q is defined, referenced, becomes undened, or has its allocation status, pointer
11 association status, array bounds, dynamic type, or type parameters changed or inquired about by execution
12 of a statement that succeeds execution of that SYNC MEMORY statement,
13 then the action regarding X on image Q precedes the action regarding Y on image Q.
14 3 User-defined ordering of segment Pi on image P to precede segment Qj on image Q occurs when
15 - image P executes an image control statement that ends segment Pi, and then executes statements that
16 initiate a cooperative synchronization between images P and Q, and
17 - image Q executes statements that complete the cooperative synchronization between images P and Q and
18 then executes an image control statement that begins segment Qj .

19 4 Execution of the cooperative synchronization between images P and Q shall include a dependency that forces
20 execution on image P of the statements that initiate the synchronization to precede the execution on image Q of
21 the statements that complete the synchronization. The mechanisms available for creating such a dependency are
22 processor dependent.

NOTE 8.40
SYNC MEMORY usually suppresses compiler optimizations that might reorder memory operations acrossthe segment boundary dened by the SYNC MEMORY statement and ensures that all memory operationsinitiated in the preceding segments in its image complete before any memory operations in the subsequentsegment in its image are initiated. It needs to do this unless it can establish that failure to do so could notalter processing on another image.

NOTE 8.41
SYNC MEMORY can be used to implement specialized schemes for segment ordering, such as the spin-waitloop. For example:

USE,INTRINSIC :: ISO_FORTRAN_ENV
LOGICAL(ATOMIC_LOGICAL_KIND),SAVE :: LOCKED[*] = .TRUE.
LOGICAL :: VALINTEGER :: IAM, P, Q
...
IAM = THIS_IMAGE()
IF (IAM == P) THEN
! Segment Pi
SYNC MEMORY ! A
CALL ATOMIC_DEFINE (LOCKED[Q], .FALSE.)
! Segment Pi+1
ELSE IF (IAM == Q) THEN
VAL = .TRUE.
DO WHILE (VAL)
! Segment Qj-1
CALL ATOMIC_REF (VAL, LOCKED)
END DO
SYNC MEMORY
! B! Segment Qj
END IF

The DO WHILE loop does not complete until VAL is defined with the value false. This is the cooperativesynchronization that provides the dependency that image Q does not complete segment Qj-1 until theCALL statement in segment Pi+1 completes. This ensures that the execution of segment Pi on image Pprecedes execution of segment Qj on image Q.

The first SYNC MEMORY statement (A) ensures that the compiler does not reorder the following statement(segment Pi+1) with the previous statements, since the lock should be freed only after the work in segmentPi has been completed.

The second SYNC MEMORY statement (B) marks the beginning of a new segment, informing the compilerthat the values of coarrays referenced in that segment might have been changed by other images in precedingsegments, so need to be loaded from memory.

NOTE 8.42
As a second example, the user might have access to an external procedure that performs synchronization between images. That library procedure might not be aware of the mechanisms used by the processor to manage remote data references and denitions, and therefore not, by itself, be able to ensure the correct memory state before and after its reference. The SYNC MEMORY statement provides the needed memoryordering that enables the safe use of the external synchronization routine. For example:

INTEGER :: IAM
REAL :: X[*]
IAM = THIS_IMAGE()
IF (IAM == 1) X = 1.0
SYNC MEMORY
CALL EXTERNAL_SYNC()
SYNC MEMORY
IF (IAM == 2) WRITE(*,*) X[1]

where executing the subroutine EXTERNAL SYNC has an image synchronization effect similar to executinga SYNC ALL statement.

Steve - Intel Developer Support

Like others, I'm also trying to get my head around coarrays.  I've noticed with examples similar to opmkl's that if the image which owns the particular coindex thingy's (stack_lock and stack_size here on image one) being used for coordination between images is sitting at a SYNC ALL or similar statement, then things tend to exhibit more parallel behaviour.  This implies to me that requests from other images to access coarrays on a particular image do not proced unless that particular image is executing a SYNC... statement of some form.

Is this a behaviour required by the standard (I don't see why it would be) or just a consequence of the underlying implementation?

Any further insight into the need or otherwise of the SYNC MEMORY appreciated.  I don't really understand the implications of the text in the standard, possibly due to executing a few too many SHIRAZ MEMORY statements over the years.

(Do you actually need two - one before the reference of stack_size and one after the modification?  Do they both need to be inside the LOCK/UNLOCK?)

Attachments: 

AttachmentSize
Download stackoftasks01.f902.86 KB

Steve,

This thread was left in pending state last year - due to improvements to be brought to the Intel Compiler for the coarray implementation. Was the behavior highlighted above investigated and is it understood now? Also, the performance of the compiler (as far as speed is concerned) was rather dismal when coarrays were used - what type of quantitative improvements have occured since then?

Thanks,

Olivier

We have made some significant improvements in the 14.0 compiler, but there is always room for more. We have fixed some SYNC MEMORY issues over the past year or two. But I'm sorry to say that we didn't further investigate the issues you raised here. I'll make a note of this and try to get back to it soon.

Steve - Intel Developer Support

Leave a Comment

Please sign in to add a comment. Not a member? Join today