"heap-arrays" option in intel 64 mode

"heap-arrays" option in intel 64 mode

Hi~,

I have a question about ifort option "heap-arrays" in intel 64 mode(64bit).

I compiled a program which requires large computation in IA-32 mode(32bit) without "heap-arrays" option, and the computation time is about 3 seconds.

In intel 64 mode(64bit), I compiled the same program with "heap-arrays" option, but in this case, the computation time is about 100 seconds.

Could any one give me the reason of it and how I could get the same performance of IA-32 mode in intel 64 mode?

10 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I think we'd need to see an example.

Steve - Intel Developer Support

I just uploaded my program.

Attachments: 

AttachmentSize
Downloadapplication/octet-stream CDATA_0.f901.3 MB
Downloadapplication/octet-stream SURF_GREEN_0.f9024.78 KB
Downloadapplication/octet-stream test_4.f90712 bytes

One missed file is attached.

Attachments: 

AttachmentSize
Downloadapplication/octet-stream PARAMETERS.f90971 bytes

Thanks for that.

First of all, I can see the difference in 32-bit. The problem appears to be inside the memory allocator - the pattern of allocations is causing it to spend a lot of time working with its free lists. The bulk of the time is taken up at the entry to NESTED_DPOL_2D where the automatic arrays B and C are declared.

This is the first program I have seen where /heap-arrays makes such a big difference. We'll investigate this some more. Did you have a need to use /heap-arrays? You could turn it on for some sources and not all if need be.

Steve - Intel Developer Support

It's just malloc and free taking all that time - I was distracted by the additional debug library stuff that malloc/free does. The routines taking most of the time are small and don't do much work, so the allocate/free swamps the actual work. NESTEDMUL_DPOL is another one.

Steve - Intel Developer Support

Oh, and I saw about an 8X change from 3 seconds to 24. I could never get it to 100 seconds. Be sure you're not building with debug libraries, which makes it worse.

Steve - Intel Developer Support

Quote:

Steve Lionel (Intel) wrote:

It's just malloc and free taking all that time - I was distracted by the additional debug library stuff that malloc/free does. The routines taking most of the time are small and don't do much work, so the allocate/free swamps the actual work. NESTEDMUL_DPOL is another one.

Dear Steve,

Thank you very much for your explanations.

The computation and allocations of variables in heap memory take much time than variable in stack memory?

Actually, the main program that uses the routines I listed above needs lots of memory, so it needs to be compiled with "heap-arrays".

Another questions:

Which memory region are the assumed shaped arrays allocated? Stack or heap?

 

Additionally, I found a difference in computational speed when variables are declared differently.

Let me show examples:

------------------------------------------------------------------------------------------------

case. 1

PROGRAM TEST
IMPLICIT NONE

real(4) :: time_begin, time_end

integer(4), parameter :: nn = 1000
real(8) :: aa(nn,nn), bb(nn,nn), cc(nn,nn)

aa = 1.0_8
bb = 1.0_8

call cpu_time(time_begin)
call foo(nn,aa,bb,cc)
call cpu_time(time_end)

print *, time_end - time_begin

contains
    subroutine foo(n, a, b, c)
        integer(4), intent(in) :: n
        real(8), intent(in) :: a(:,:), b(:,:)
        real(8), intent(out) :: c(:,:)

        integer(4) :: i, j, k

        do i = 1, n
            do j = 1, n
                c(i,j) = 0.0_8
                do k = 1, n
                    c(i,j) = c(i,j) + a(i,k)*b(k,j)
                end do
            end do
        end do
    end subroutine foo
end program

==========================================

case. 2

PROGRAM TEST
IMPLICIT NONE

real(4) :: time_begin, time_end

integer(4), parameter :: nn = 1000
real(8) :: aa(nn,nn), bb(nn,nn), cc(nn,nn)

aa = 1.0_8
bb = 1.0_8

call cpu_time(time_begin)
call foo(nn,aa,bb,cc)
call cpu_time(time_end)

print *, time_end - time_begin

contains
    subroutine foo(n, a, b, c)
        integer(4), intent(in) :: n
        real(8), intent(in) :: a(n,n), b(n,n)
        real(8), intent(out) :: c(n,n)

        integer(4) :: i, j, k

        do i = 1, n
            do j = 1, n
                c(i,j) = 0.0_8
                do k = 1, n
                    c(i,j) = c(i,j) + a(i,k)*b(k,j)
                end do
            end do
        end do
    end subroutine foo
end program

------------------------------------------------------------

The two cases are compiled in IA32 and without the option 'heap-arrays'.

The second case is much faster.

This means it is better to declare variables as automatic array than as assumed shape arrays. Is it true?

Assumed-shape arrays don't imply any particular allocation. If they also have the ALLOCATABLE attribute then they are always heap allocated. If POINTER, they're heap-allocated if ALLOCATE is used, otherwise they're whatever the target was when pointer assignment was done.

The computation aspect when using /heap-arrays isn't the issue - there is no difference. But there is a cost to heap allocation and deallocation, whereas stack allocation is a single subtract instruction.

Your two examples in the last post are something else entirely - the allocation is done in the main program and the arrays are all dummy arguments, not automatic arrays. The only difference is where the bounds are passed. In the second example, the compiler has more information about the bounds than it does in the first, and this can improve optimization. Most tests I have seen don't show significant differences here, though. When constructing such tests, make sure that the optimizer hasn't thrown away computational work because it sees the results were never used, which is exactly what happened here. When I add a use of C after the timing, I get identical times for the two programs.

Steve - Intel Developer Support

Leave a Comment

Please sign in to add a comment. Not a member? Join today