reshape

reshape

Hi

I'm playing with RESHAPE to see if it worths the effort to kill some do cicles in one code. I've done a simple test program which compiles and runs with gfortran but segfaults with ifort (11.1). It segfaults doing the reshape.

Can someone give me a hand?

integer, parameter :: ni=310, nj=300, nk=128, njd=nj+2
real, dimension(ni,njd,nk,3) :: ref

real, dimension(ni,nk,nj,3), target :: ns, neu

integer :: i,j,k, idx,var
real :: ts, te1, te2, sum1,sum2

!> CODE STARTS ------------------------------------------------------- *

forall(i=1:ni,j=1:nj,k=1:nk)
ref(i,j,k,1) = i
ref(i,j,k,2) = j
ref(i,j,k,3) = k
end forall

call cpu_time(ts)
ns = reshape(ref(1:ni,1:nj,1:nk,:), [ni,nk,nj,3], order = [1,3,2,4])
call cpu_time(te1)
te1 = te1-ts

call cpu_time(ts)

do k=1,nk
do j=1,nj
do i=1,ni

neu(i,k,j,1) = ref(i,j,k,1)
neu(i,k,j,2) = ref(i,j,k,2)
neu(i,k,j,3) = ref(i,j,k,3)

enddo
enddo
enddo

call cpu_time(te2)

te2 = te2-ts

strace results (last lines):

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1839c0e000
mmap(NULL, 2185880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f1838e06000
mprotect(0x7f1838e1c000, 2093056, PROT_NONE) = 0
mmap(0x7f183901b000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f183901b000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1839c0d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1839c0c000
arch_prctl(ARCH_SET_FS, 0x7f1839c0c6f0) = 0
open("/dev/urandom", O_RDONLY) = 3
read(3, "R\301\320\247\327.\266{", 8) = 8
close(3) = 0
mprotect(0x7f1839363000, 16384, PROT_READ) = 0
mprotect(0x7f183956f000, 4096, PROT_READ) = 0
mprotect(0x7f1839786000, 4096, PROT_READ) = 0
mprotect(0x7f1839a0d000, 4096, PROT_READ) = 0
mprotect(0x7f1839c2b000, 4096, PROT_READ) = 0
munmap(0x7f1839c10000, 97850) = 0
set_tid_address(0x7f1839c0c780) = 11720
set_robust_list(0x7f1839c0c790, 0x18) = 0
futex(0x7fff41c2ad2c, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigaction(SIGRTMIN, {0x7f1839576670, [], SA_RESTORER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f1839576700, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
brk(0) = 0x1a809000
brk(0x1a82a000) = 0x1a82a000
rt_sigaction(SIGFPE, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGABRT, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, {SIG_IGN, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, 8) = 0
rt_sigaction(SIGINT, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, {SIG_DFL, [], 0}, 8) = 0
futex(0x7f18395700ec, FUTEX_WAKE_PRIVATE, 2147483647) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Segmentation fault

Ricardo Reis
'Non Serviam'
@ http://www.lasef.ist.utl.pt
@ http://www.radiozero.pt
@ http://rreis.tumblr.com
@ http://www.flickr.com/photos/rreis
15 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - rreis
Hi

I'm playing with RESHAPE to see if it worths the effort to kill some do cicles in one code. I've done a simple test program which compiles and runs with gfortran but segfaults with ifort (11.1). It segfaults doing the reshape.

Can someone give me a hand?

integer, parameter :: ni=310, nj=300, nk=128, njd=nj+2
real, dimension(ni,njd,nk,3) :: ref

real, dimension(ni,nk,nj,3), target :: ns, neu

integer :: i,j,k, idx,var
real :: ts, te1, te2, sum1,sum2

!> CODE STARTS ------------------------------------------------------- *

forall(i=1:ni,j=1:nj,k=1:nk)
ref(i,j,k,1) = i
ref(i,j,k,2) = j
ref(i,j,k,3) = k
end forall

call cpu_time(ts)
ns = reshape(ref(1:ni,1:nj,1:nk,:), [ni,nk,nj,3], order = [1,3,2,4])
call cpu_time(te1)
te1 = te1-ts

call cpu_time(ts)

do k=1,nk
do j=1,nj
do i=1,ni

neu(i,k,j,1) = ref(i,j,k,1)
neu(i,k,j,2) = ref(i,j,k,2)
neu(i,k,j,3) = ref(i,j,k,3)

enddo
enddo
enddo

call cpu_time(te2)

te2 = te2-ts

strace results (last lines):

mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1839c0e000
mmap(NULL, 2185880, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0x7f1838e06000
mprotect(0x7f1838e1c000, 2093056, PROT_NONE) = 0
mmap(0x7f183901b000, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x15000) = 0x7f183901b000
close(3) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1839c0d000
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f1839c0c000
arch_prctl(ARCH_SET_FS, 0x7f1839c0c6f0) = 0
open("/dev/urandom", O_RDONLY) = 3
read(3, "R301320247327.266{", 8) = 8
close(3) = 0
mprotect(0x7f1839363000, 16384, PROT_READ) = 0
mprotect(0x7f183956f000, 4096, PROT_READ) = 0
mprotect(0x7f1839786000, 4096, PROT_READ) = 0
mprotect(0x7f1839a0d000, 4096, PROT_READ) = 0
mprotect(0x7f1839c2b000, 4096, PROT_READ) = 0
munmap(0x7f1839c10000, 97850) = 0
set_tid_address(0x7f1839c0c780) = 11720
set_robust_list(0x7f1839c0c790, 0x18) = 0
futex(0x7fff41c2ad2c, FUTEX_WAKE_PRIVATE, 1) = 0
rt_sigaction(SIGRTMIN, {0x7f1839576670, [], SA_RESTORER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGRT_1, {0x7f1839576700, [], SA_RESTORER|SA_RESTART|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigprocmask(SIG_UNBLOCK, [RTMIN RT_1], NULL, 8) = 0
getrlimit(RLIMIT_STACK, {rlim_cur=8192*1024, rlim_max=RLIM_INFINITY}) = 0
brk(0) = 0x1a809000
brk(0x1a82a000) = 0x1a82a000
rt_sigaction(SIGFPE, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGILL, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGSEGV, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGABRT, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGTERM, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, {SIG_IGN, [], 0}, 8) = 0
rt_sigaction(SIGQUIT, {SIG_IGN, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, 8) = 0
rt_sigaction(SIGINT, {0x4053b0, [], SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, 0x7f183957f720}, {SIG_DFL, [], 0}, 8) = 0
futex(0x7f18395700ec, FUTEX_WAKE_PRIVATE, 2147483647) = 0
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
--- SIGSEGV (Segmentation fault) @ 0 (0) ---
+++ killed by SIGSEGV +++
Segmentation fault

My initial reaction is that this is just a stack space issue, as RESHAPE will use an array temporary for sure. Have you tried -heap-arrays option?

ron

Quoting - Ronald W. Green (Intel)

My initial reaction is that this is just a stack space issue, as RESHAPE will use an array temporary for sure. Have you tried -heap-arrays option?

ron

Yes, just another case of running out of stack due to the array temporary created by the RESHAPE. You can avoid the sigsegv by using the usual techniques discussed on this forum (sigsegv article on right-hand side links on this forum).

ron

Quoting - Ronald W. Green (Intel)

Yes, just another case of running out of stack due to the array temporary created by the RESHAPE. You can avoid the sigsegv by using the usual techniques discussed on this forum (sigsegv article on right-hand side links on this forum).

ron

Is there a way to make reshape avoid using a temporary array?

thanks for the input,

Ricardo Reis
'Non Serviam'
@ http://www.lasef.ist.utl.pt
@ http://www.radiozero.pt
@ http://rreis.tumblr.com
@ http://www.flickr.com/photos/rreis

Have your tried removing your inner most loop?









do k=1,nk 
  do j=1,nj 
    neu(:,k,j,1) = ref(:,j,k,1) 
    neu(:,k,j,2) = ref(:,j,k,2) 
    neu(:,k,j,3) = ref(:,j,k,3) 
  enddo 
enddo 

www.quickthreadprogramming.com

It's a good idea Jim, I haven't thought about it because that piece of code was just to make sure I was getting the right output from RESHAPE. The fact remains doing the loop is faster than using RESHAPE, probably because of the temporary array... now, if there was a way of the compiler doing it smartly...

best,

Ricardo Reis
'Non Serviam'
@ http://www.lasef.ist.utl.pt
@ http://www.radiozero.pt
@ http://rreis.tumblr.com
@ http://www.flickr.com/photos/rreis

Quoting - rreis
It's a good idea Jim, I haven't thought about it because that piece of code was just to make sure I was getting the right output from RESHAPE. The fact remains doing the loop is faster than using RESHAPE, probably because of the temporary array... now, if there was a way of the compiler doing it smartly...

best,

Well the RESHAPE will always use a temporary. Remember, the compiler evaluates the RHS of the statement in isolation from the LHS. So the RESHAPE will create the array object specified, and after that, worry about the assignment. Consider if that RESHAPE'd object were then passed as an argument, or used in some other array context on the RHS. You need that RESHAPE'd object created before you do something with it.

This is another one of those cases where human semantic evaluation of the expression in full, LHS and RHS in parallel, can outthink a compiler.

and if you dig into the gfortran code, you'd probably find they are creating a temporary also - only they default to heap for storage.

Quoting - Ronald W. Green (Intel)

and if you dig into the gfortran code, you'd probably find they are creating a temporary also - only they default to heap for storage.

By default, gfortran switches to heap at a certain size of temporary, so you don't incur stack overflow. ifort has an option to do the same. In a few cases, where the temporary fits the destination exactly with no intervening operations, gfortran stores it there directly, even when ifort doesn't. If you don't want a temporary, you write it out in f77.
I didn't check what the compiler does with your forall, but multiple assignments in a forall are difficult for a compiler to optimize, while ifort does an excellent job of multiple assignments per do loop, at least when the original nesting is right.

Ronald,

The temporary will have an array descriptor - eh?
Why not, if necessary, delete the contents of the LHS, then copy the array descriptor (not the data)?

Jim Dempsey

www.quickthreadprogramming.com

many thanks for all your answers. I think I'll stick with the "manual approach" then. What I would expect was some mechanism that would allow one, for instance, to inherit the temp array, like

real, dimension(ni,nj,nk) :: old
real, dimension(:,:,:), allocatable :: neu

neu = reshape(old, [ni,nk,nj], [1,3,2])

Ricardo Reis
'Non Serviam'
@ http://www.lasef.ist.utl.pt
@ http://www.radiozero.pt
@ http://rreis.tumblr.com
@ http://www.flickr.com/photos/rreis

Ricardo,

Since you are looking at the code now, and you would like to use reshape later, consider changing

neu = reshape(old, [ni,nk,nj], [1,3,2])

to

call my_reshape(neu, old, ni,nk,nj,1,3,2)

If you have different rank arrays then you can use generic interface to select the ranked-ness for the arguments.

For now, your code (in one spot) can perfrom the manual reshape,as IVF versions come outyou can test replacing the manual code (now in one spot) with the reshape to see which works best. The compiler optimization will likely inline either version of the code so there should be little impact on performance.

Jim Dempsey

www.quickthreadprogramming.com

Thanks for the suggestion Jim, maybe I'll do it. Is there a way to "turn off" certain intrinsics or replace their calls for a user funtion? Then it would be a question of turning ON or OFF this switch...

Ricardo Reis
'Non Serviam'
@ http://www.lasef.ist.utl.pt
@ http://www.radiozero.pt
@ http://rreis.tumblr.com
@ http://www.flickr.com/photos/rreis

You can always declare your own routine to replace an intrinsic, but be aware that your replacement won't be usable in initialization expressions. You cannot "turn off" an intrinsic.

Steve - Intel Developer Support

Quoting - Steve Lionel (Intel)
You can always declare your own routine to replace an intrinsic, but be aware that your replacement won't be usable in initialization expressions. You cannot "turn off" an intrinsic.

thanks for the explanation and warning. the route is written then, so to speak :)

Ricardo Reis
'Non Serviam'
@ http://www.lasef.ist.utl.pt
@ http://www.radiozero.pt
@ http://rreis.tumblr.com
@ http://www.flickr.com/photos/rreis

You can use the Fortran PreProcessor to "turn off" an intrinsic by renaming it to your own routine. However, this may not work in all cases. Handling various arguments and/or types may be problematic. To some extent this can be corrected with the use of generic interfaces.

In this case (unnecessary generation of temp) might not be corrected by

#define reshape(a,b,c,d) my_reshape(a,b,c,d)
outArray = reshape(...)

because your implementation as function, will not have access to the array descriptor of outArray. Whereas, when called as subroutine, you do (can)have access to the array descriptor.

Inserting a CALL to your reshape subroutine might be the lesser of all evils.

Jim Dempsey

www.quickthreadprogramming.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today