copy an array section within an array without temporary copy

copy an array section within an array without temporary copy

I have some medium sized 3d arrays (approx 500x500x10) for which I sometimes need to copy a section of the array to a different (non-intersecting) section of the same array. E.g.:

a(:,:,1:3) = a(:,:,4:6)

I was surprised to find out that a temporary copy is made (inferred from the stack overflow I get which I don't get if the last subscripts are constants) in this case since it's trivial to confirm that the sections do not overlap. The sollution is then to use a loop, but I find that "unattractive" compared to using the array section language feature.

Is it possible to avoid a copy in some other way?

And another, related question: if the subscript bounds are not literal constants but (scalar) variables I find that not even the loop approach works, i.e.

j = 3
DO i = 1,3
    a(:,:,i) = a(:,:,i+j)
END DO

also produce a stack overflow. I fail to see what could possibly require the temporary copy in this case... Is it possible to circumvent that in this case?

I know that I can increase the stack size to fix the overflow issue, but performance is somewhat important and so i really would like to minimize unnecessary copying.

18 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
jimdempseyatthecove's picture

Try adding !DEC$ IVDEP in front of the DO loop.

Jim Dempsey

www.quickthreadprogramming.com

Besides controlling optimization by IVDEP or other directives (or using a compiler which does optimize these), you have the alternative here of invoking one of the MKL ?COPY functions.

I don't expect to see any major changes in the policy of requiring directives etc. to get vectorization or eliminating expensive copying methods.  I had an IPS issue closed saying no more such optimizations would be undertaken, but then a few were implemented later.

Steve Lionel (Intel)'s picture

We keep making improvements to overlap detection - I will ask the developers if this case can be handled as well.

Steve

I received an example from IanH which shows how ifort tries to minimise the use of temporary copies of arrays. I'm not sure if I like the result.

This example shows that assumed shape arrays can produce arrays that are non-contiguous, which I found a bit disturbing. Hopefully the modified example I have provided demonstrates the problem. ( Thanks to our IT dept. I lost my network license for ifort over the weekend, so couldn't check the changes !! I will need to post an edit if it is wrong!)

args.f90 is Ian's version that displays that the first call doesn't produce a copy, and my v2 hopefully reports the spacing between elements of array "arg".

This resulting non-contiguous vector is not what would expect in a F95 environment, so I'm not sure of upward compatibility.

John

Attachments: 

AttachmentSize
Download args_v2.f901.47 KB
Download args.f901.09 KB
jimdempseyatthecove's picture

>>This example shows that assumed shape arrays can produce arrays that are non-contiguous, which I found a bit disturbing.

Then perhaps you should be using assumed size or explicit shape.

Jim Dempsey

www.quickthreadprogramming.com
Steve Lionel (Intel)'s picture

I have escalated this as issue DPD200255019. While the compiler will avoid the temp if the last bound is a single constant, it doesn't do so if it's a range. I will note that IVDEP has no effect on this.

Steve

Jim,

The problem I see is that for the AssumedShape example, the array recieved is not contiguous, but if it is further transfered, it must be received as assumed shape and not as explicit shape for it's content to be correctly used. I have attached an expanded example args_v4.f90 that shows this difference.

I was not aware that when transferring an array section that the property of non-contiguous might apply. I just wonder if ifort has gone too far in not providing a temporary copy in this case, as this is a result that I did not expect. Is this standard conforming, and if so do all standards (F90, F95, F03 or F08) all agree with this result ?

John

To answer the original post, the non-standard approach I have always adopted for this is to use a vector move routine, such as:

      n = size ( a(:,:,4:6) )

	      call move_array ( a(1,1,4), a(1,1,1), n )

	...

	      subroutine move_array (from, to, n)

	      integer*4 from(n), to(n), i

	!

	      if (loc(to) > loc(from))

	        do i = n,1,-1

	           to(i) = from(i)

	        end do

	      else

	        do i = 1,n

	           to(i) = from(i)

	        end do

	      end if

	      end subroutine move_array

Again I have not considered that an array might not be contiguous.
( How did Iget tripple line spacing when using copy from notepad this time ?? )

Attachments: 

AttachmentSize
Download args_v4.f903.16 KB

Note that John is exploring a different aspect to the original post.

The behaviour (versus the program - which uses non-standard stuff to diagnose what the implementation is doing) is standard conforming to whatever Fortran standard you pick.  Beyond that, allowing assumed shape dummy arguments to be efficiently associated with "regularly" non-contiguous actual arguments is pretty clearly allowed for in the design of the language.

F2008 allows a programmer to further specify that an actual argument associated with an assumed shape dummy argument must be contiguous (obligation on the programmer), but it doesn't change the requirements or required behaviour for things associated with dummy arguments without the contiguous attribute.

The array section that is the original actual argument is non-contiguous (in the absence of extreme compiler trickery).  You can either work with a non-contiguous thing, or a contiguous thing. 

Working with a non-contiguous thing may incur some execution speed overhead (because of cache 'n stuff - absence of detail due to my level of ignorance). 

Making a contiguous copy of it will incur some memory and execution speed overhead. 

Which choice of overhead is less significant depends on many, many things.  So the best approach for your program depends on many, many things.  In the general case (and assuming it really matters), it is probably impossible for the compiler to know which choice is best.

However, a compiler that always did a temporary copy when associating a non-contiguous actual argument with an assumed shape dummy practically removes the choice from the programmer - effectively removing the opportunity for them to do manual optimisation.  I think it likely that loss of opportunity, overall, on average, would result in slower programs.  As stated above, that loss of opportunity is also somewhat contrary to what I assume is the intent of the language designers ("regularly" non-contiguous actual arguments should be efficiently associated with assumed shape dummies).

Personally, I've found the temporary copies far more troublesome than any overhead from working with non-contiguous assumed shape dummy arguments.  Putting performance aside, excessive creation of temporaries can result in things like stack space or address space exhaustion, which cause the program to die.  Successful and correct completion of a program is a basic need, having it run faster is a long way secondary to that.  So assumed shape is the general rule for me.  If the whiz-bang optimization tools then tell me that might be problematic in specific locations, then I could revisit the general rule, but it would need to be a rainy day.

IanH,

I don't think I strayed too far from the topic. I was trying to identify the uncertainty of array sections producing temporary copies.

Thanks for your comments, although I note you also have a foot in both camps, when you identify "the opportunity for them to do manual optimisation" then " Putting performance aside". You do explain the virtues of deferred shape arrays.

My experience of array sections on a variety of compilers has resulted in me avoiding them wherever possible. When porting code, they can throw up some problems that are difficult to identify.  For me, any code structure that loads up the stack is a time bomb waiting to explode, and it usually happens when you don't have the time to find it. It is important to spend the time designing a data structure that minimises these problems.

Thanks again for the example you provided as it does highlight a consequence that I was not considering.

John

 

Quote:

jimdempseyatthecove wrote:

>>This example shows that assumed shape arrays can produce arrays that are non-contiguous, which I found a bit disturbing.

Then perhaps you should be using assumed size or explicit shape.

Jim Dempsey

John, Jim:

As alluded by IanH in Quote #9, considerable attention has been given to this issue in Fortran 2008 where assumed shape arguments and Fortran pointers can be given the CONTIGUOUS attribute.  Intel Fortran supports it even though I don't know if all of Fortran 2008 features related to contiguous objects are fully implementted and if there are any bugs outstanding - Steve can perhaps elaborate.  Based on what I understand:

  1. ALLOCATABLE arrays created using the ALLOCATE statement are almost always (or may be always) CONTIGUOUS,
  2. Dummy arguments with assumed shape can be given the CONTIGUOUS attribute when the code developer knows for sure that it is the case; this can aid in compiler optimization,
  3. The combination of 1. and 2. can suffice in many situations; in fact, it would meet all my needs, and
  4. There is a new intrinsic, IS_CONTIGUOUS(arg), one can use to inquire whether arg is contiguous.

Assumed shape dummy arguments provide many advantages and Fortran 2008 provides further improvements.  So I suggest continued interest - don't discard them yet.

 

Allocatable arrays fall into the category of things that are simply contiguous - they are always contiguous and the compiler knows it.

One aspect that bothers me about the CONTIGUOUS attribute for dummy arguments or pointers is that it isn't compile time checkable (assuming I read the standard correctly).  To me - "when the code developer knows for sure" is synonymous with "the code developer will soon forget and botch things up".  Given that, it seems a strange language design decision in this day and age.

Quote:

Espen M. wrote:

I have some medium sized 3d arrays (approx 500x500x10) for which I sometimes need to copy a section of the array to a different (non-intersecting) section of the same array. E.g.:

 

a(:,:,1:3) = a(:,:,4:6)

 

I was surprised to find out that a temporary copy is made (inferred from the stack overflow I get which I don't get if the last subscripts are constants) in this case since it's trivial to confirm that the sections do not overlap. The sollution is then to use a loop, but I find that "unattractive" compared to using the array section language feature.

Is it possible to avoid a copy in some other way?

...

If you're doing this often enough and performance is really important, have you considered a mixed language solution using, say, a C routine that can avoid array temporaries?  Or, gone back to why you need to copy array sections in the first place and whether you can avoid doing so, perhaps using Fortran pointers?

FWIW, the following simple code does NOT involve temporary array copies in gfortran 4.9 but it does in Intel Fortran:

PROGRAM p

   IMPLICIT NONE

   INTEGER, PARAMETER :: N = 500
   INTEGER :: a(N,N,6)

   a(:,:,1:3) = 1
   a(:,:,4:6) = 2

   WRITE(*,*) "Before Assignment:"
   WRITE(*,*) " a(1:2,1:2,1:3) = ", a(1:2,1:2,1:3)
   WRITE(*,*) " a(1:2,1:2,4:6) = ", a(1:2,1:2,4:6)

   a(:,:,1:3) = a(:,:,4:6)

   WRITE(*,*) "After Assignment:"
   WRITE(*,*) " a(1:2,1:2,1:3) = ", a(1:2,1:2,1:3)
   WRITE(*,*) " a(1:2,1:2,4:6) = ", a(1:2,1:2,4:6)

   STOP

END PROGRAM p

 

What is interesting with your example is when does the temporary storage for the array section get allocated ?

The stack overflow is reported before the "Before Assignment" is reported. Does this mean the temporary array is alloacted at the start of execution and not when required ?

My example code attached indicates that it occurs at initiation of the routine where the temporary array is required, rather than at the code line when required. If you comment out the a(:... on line 33 the stack overflow error is delayed until mess_stack is called.

John

Attachments: 

AttachmentSize
Download ps_0.f902.18 KB

Quote:

John Campbell wrote:

..

The stack overflow is reported before the "Before Assignment" is reported. Does this mean the temporary array is alloacted at the start of execution and not when required ?

..

Steve might be able to shed light on this as this aspect appears to be "processor-dependent" i.e., it is specific to how Intel Fortran works with the stack.

Re: the simple example I showed above, I've to set an adequate stack reserve size for it to run using Intel Fortran.

Steve Lionel (Intel)'s picture

In this case, the compiler knows the size of the temp it wants and allocates it on entry. In most real programs, the temp size isn't known until run-time so it tends to happen at the point of use. I agree we can do better here.

Steve

It appears the rules for the ASSOCIATE construct might aid in compiler optimization and help avoid the temporary copy.  The code shown below, a minor variation of the example in Quote #13 above in the form of a ASSOCIATE construct in lines 17 through 19, seems to avoid the temporary array and runs without any stack overflow.  I was able to execute this example with the stack default stack size whereas the example in Quote #13 needed an appreciable stack reserve size (>200000, if I recall correctly).

PROGRAM p

   IMPLICIT NONE

   INTEGER, PARAMETER :: N = 500
   INTEGER :: a(N,N,6)

   a(:,:,1:3) = 1
   a(:,:,4:6) = 2

   WRITE(*,*) "Before Assignment:"
   WRITE(*,*) " a(1:2,1:2,1:3) = ", a(1:2,1:2,1:3)
   WRITE(*,*) " a(1:2,1:2,4:6) = ", a(1:2,1:2,4:6)

   WRITE(*,*)
   WRITE(*,*) "Assignment begins:"
   Assoc_Copy: ASSOCIATE ( Lhs => a(:,:,1:3), Rhs => a(:,:,4:6) )
      Lhs = Rhs
   END ASSOCIATE Assoc_Copy
   
   WRITE(*,*) "Assignment ends:"
   WRITE(*,*)

   WRITE(*,*) "After Assignment:"
   WRITE(*,*) " a(1:2,1:2,1:3) = ", a(1:2,1:2,1:3)
   WRITE(*,*) " a(1:2,1:2,4:6) = ", a(1:2,1:2,4:6)

   STOP

END PROGRAM p

Not sure if this is a temporary benefit associated (!) with the current implementation of ASSOCIATE feature in Intel Fortran (perhaps with the use of pointers behind the scenes for ASSOCIATE aliasing that checks for array indices overlap, etc.) or whether this is sustainable.  Steve, can you please take a look and provide your feedback on this?  Is this a valid workaround for the issue in OP?

 

Steve Lionel (Intel)'s picture

Very interesting. It looks to me as if with ASSOCIATE, the compiler may not be considering overlap. If I change the ASSOCIATE to create an overlap, I don't see a temp being created, but I also see it is moving element by element. If I get a chance I'll try to come up with a test case that shows whether or not there's a problem.

Steve

Login to leave a comment.