inconsistent results with -O2 as compared to -O1

inconsistent results with -O2 as compared to -O1

I have encountered following problem with ifort compiler v 12.1.3 on x86_64 GNU/Linux.
When I compile the following minimal example with the options "-O2 -implicitnone" I get
different results for the two loops in the "compute_matrix_H" subroutine (i.e. the two DO loops,
where the first one has as upper bound the variable N, while a fixed value is used in the second one).
It seems that with "-O1", or adding "-g" to "-O2" resolves this issue as well as allocating the working
array mp_v3 dynamically as indicated in the subroutine. Also not allocating the (unused) array mat_SS
seems to have a positive effect.

The included module "parameters" is stored in an external file and contains solely the declaration

INTEGER, PARAMETER :: adequate = KIND(1D0)

It seems that the problem is also resolved when this declaration is moved into the

main program and the inclusion of the "parameters" module is therefore avoided.

Any ideas why this occurs?

Thanks in advance,



      USE parameters
!     INTEGER, PARAMETER :: adequate = KIND(1D0)

!     INTEGER :: NDim

!     INTEGER :: NGWPs
      COMPLEX(adequate), ALLOCATABLE :: mat_SS(:, :)
      NDim = 2

      NGWPs = 16

      ALLOCATE(mat_SS(NGWPs, NGWPs))
      CALL compute_matrix_H(NDim)
SUBROUTINE compute_matrix_H(N)

      COMPLEX(adequate) :: N_mat(1:N, 1:N), mp_v3(1:N, 1:N, 1:N)

      REAL(adequate) :: x, y, beta

      INTEGER :: M, l, k

!     COMPLEX(adequate), ALLOCATABLE :: N_mat(:, :), mp_v3(:, :, :)

!     ALLOCATE(n_mat(N, N), mp_v3(N, N, N))
      WRITE(*, *) "size of mp_v3 = ", SIZE(mp_v3)

      x = 3

      y = 2

      beta = 0.01_adequate

      mp_v3(1, 1, 1) = 6*x*beta

      mp_v3(2, 1, 1) = 2*y

      mp_v3(1, 2, 1) = 2*y

      mp_v3(2, 2, 1) = 2*x

      mp_v3(1, 1, 2) = 2*y

      mp_v3(2, 1, 2) = 2*x

      mp_v3(1, 2, 2) = 2*x

      mp_v3(2, 2, 2) = 6*y*beta
      N_mat = 0

      DO k = 1, N

            DO l = 1, N

                  N_mat(l, k) = mp_v3(l, l, k)

            END DO

      END DO

      WRITE(*, '(2(2F22.15,/))') ABS(N_mat)
      N_mat = 0

      DO k = 1, 2

            DO l = 1, 2

                  N_mat(l, k) = mp_v3(l, l, k)

            END DO

      END DO

      WRITE(*, '(2(2F22.15,/))') ABS(N_mat)


3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

I get the same results at -O2 as at -O0.

It is possible for optimization to affect the results of some programs though due to changes in the order of operations. If you want to avoid any optimizations that can affect the results,compile with the option-fp-model precise.

There is definitely an optimizer bug here.

The loop

      DO k = 1, N

            DO l = 1, N

                  N_mat(l, k) = mp_v3(l, l, k)

            END DO

      END DO

is compiled to

..B1.13:                        # Preds ..B1.13 ..B1.12

        movaps    (%rcx,%rdx), %xmm0                            #15.12

        incq      %rdi                                          #15.12

        movaps    %xmm0, (%rsi,%rbx)                            #15.12

        addq      $16, %rsi                                     #15.12

        addq      %r14, %rcx                                    #15.12

        cmpq      %rax, %rdi                                    #15.12

        jb        ..B1.13       # Prob 96%                      #15.12

with the 64 bit version of IFort Build 20120410 (and Build 20120410 on Windows 7) when the options used are -O2 -QxHOST (OS: Suse Linux 12.1, CPU: C2D E8400). On Windows, the same errors occur if -O0 is used. On both platforms, no errors occurs when -O1 or -O3 are used.

Similar errors occur with the 32-bit version of the same release, and the corresponding WIndows compilers, although of course the assembler outputs are different:

The fault is in the instruction

        addq      $16, %rsi                                     #15.12
In effect, the compiled code is storing into elements (1,1,1), (1,2,1), (1,1,2), (1,2,2) of array mp_v3, because of the uniform stride in the address calculation in the compiled code. The proper elements that should have been stored into are (1,1,1), (2,2,1), (1,1,2), (2,2,2), and the corresponding increments that should have been added to the index in %rsi should have been 24, 8 and 24 bytes, instead of the uniform 16 byte increment used.

The use of "-fp precise" has no relevance to this bug, which is a serious optimizer bug and should be noted.

Curiously, if the declaration, allocation and deallocation of the array mat_SS, which is otherwise not used in the rest of the program, are taken out, the bug disappears.

I have made up a simpler "reproducer" by removing complex variables and portions of the code that may be removed without making the bug go away, and posted it in a new thread, Optimizer Bug.

[UPDATE: Ron Green of Intel has responded with:

The bug report ID is DPD200232347


Leave a Comment

Please sign in to add a comment. Not a member? Join today