consider a "contrived" code segment:
integer :: N(3)
do N(1) = 1,5 write(*,*) N(1)enddo
Shouldn't that be legal?
Not by the current rules of the language. The do variable must be the name of a scalar.
Suppose we do this:
programinteger :: L, M
do concurrent (L=1:3000) do M=1,3000 write(*,*) L, M enddoenddoend
M is seen by all simultaneously executing threads. Will the "M" loop work? It might by accident.
If this was openMP We would do:
programinteger :: L, M!$omp parallel private(M)!$omp dodo (L=1:3000) do M=1,3000 write(*,*) L, M enddoenddo!$omp end do!$omp end parallelend
This will give each thread it's own copy of the indexing variable M.
If Intel supported the "block" statement we could do this:
programinteger :: Ldo concurrent (L=1:3000) BLOCK integer M do M=1,3000 write(*,*) L, M enddo END BLOCKenddoend
The "do concurrent" statement seems incomplete.
Incomplete, how? What would you want to do in a real program?
I don't think there's any room for accident here - the inner loop will "work" (as in the behaviour is well defined, though the order of output from the write may vary). M is undefined after the DO CONCURRENT construct completes.
That won't work, as noted. What would work, and achieve the desired result, is
INTEGER, DIMENSION(3) :: NINTEGER :: JEQUIVALENCE (J, N(1))
DO J = 1,5etc.
Also, when the above loop is enclosed within a DO CONCURRENT that the code generated may registerize the loop control variable J. Thus each thread will effectively have a private (non-interfering) copy of the loop control variable when registerization occurs. Registerization is not guaranteed though.
If you do need assurances of independent, enclosed, variables within the DO CONCURRENT then convert the body of the DO CONCURRENT loop into a subroutine.
The code is just to illustrate a point - then a question.
integer :: L, M,do concurrent (L=1:3) do M=1,3 write(*,*) L, M ! <--- This is just dummy code - I dont care what is actually within the loop enddoenddo
Does each executing thread get it's own copy of M?
Or will there be only one M that is shared by all threads - each thread modifying M independently?
OpenMP directly recognizes and addresses the issue by providing the "private" clause.
Does the Fortran do concurrent construct "implicitly" handle the problem by creating L copies of M?
No, Fortran does not do that - or at least it does not specify what is to happen. When you use DO CONCURRENT, it is the programmer's responsibility to recognize that the loop bodies may be executed in any order. In practice, at least, Intel Fortran will not parallelize this DO CONCURRENT because of dependencies.
DO CONCURRENT was not intended to be a replacement for OpenMP. Rather, it was a replacement for FORALL. It has its uses, but is not as general as OpenMP.
I will comment that you can write:
DO CONCURRENT (L=1:3,M=1:3)
without the inner loop, which comes closer to what you're asking for. Intel Fortran won't parallelize it due to the WRITE, as far as I can tell.
Oh well - I'll have to keep using openMP.
A query to clarify - are you saying that the language doesn't specify how the compiler implements DO CONCURRENT that really is executed concurrently, or that the language doesn't specify what the example snippet does?
My take: There's a difference between allowing for executing the loop bodies in any order and executing them concurrently, though the former is a step towards the latter. The restrictions on modifying variables that are "in scope" in the do concurrent construct allow for any order of execution (and the example snippet meets those restrictions), but not necessarily concurrent execution, unless the processor is prepared to create private copies (or similar) of all variables that are updated by more than one iteration. Hence I think the behaviour of the example snippet is well defined (bar the order of output records), but whether things happen in parallel is a compiler specific implementation question.
(Rather than asking about shared/private variable storage etc, perhaps the OP's question should simply be "will this be parallelised by ifort?")
DO CONCURRENT (L=1:3,M=1:3) has slightly different semantics, in that for a particular L, the increments of M might occur in any order (your print statements can be even more jumbled!). Whether that's what the OP wants or not depends on what the OP wants or not.
(Does IO really prevent ifort from doing concurrent execution, or does the compiler just decide for that case that it is not worthwhile?)
You have no assureances as to M is or is not registerized. Debug mode - most likely not. Release mode with full optimization, likely however the above write may copy the register to the persistant M, and in which case you will/may experience a conflict.
As an alternative to subroutine, experiment with
integer :: L, Mvar(MAX_THREADS)integer(LONG), volatile :: iThread
iThread = 0do concurrent (L=1:3) ASSOCIATE(M => Mvar(InterlockedIncrement(iThread))) do M=1,3 write(*,*) L, M ! <--- This is just dummy code - I dont care what is actually within the loop enddo END ASSOCIATEenddo
*** The IVF documentation is unclear as to what happens in the above scenario.*** The Fortran standards may have addressed this issue.
The Fortran standard says: "The range of a DO CONCURRENT construct is executed for every active combination of the index-name values. Each execution of the range is an iteration. The executions may occur in any order."
Note that it says nothing about them executing in parallel. The idea is that the executions MAY execute concurrently, but nothing requires them to do so. You then have to extrapolate what it means for a DO loop within the range from the text that discusses the (non-concurrent) DO. In this case the inner loop should execute three times, but there is an implied shared variable M and it is this dependency that Intel Fortran decides is a barrier to autoparallelism, since there is both a read and a write to M in each iteration. Or maybe it's the I/O, since when I try it with both L and M in the index range, it still blocks autoparallel.
Of course, in a real program you wouldn't do this sort of thing, but I understand why people try it. The compiler's parallelizer is conservative and generally won't parallelize a loop if it won't get substantially the same results as serial execution. Clearly with I/O, that won't happen.