Does the CONTIGUOUS attribute mean anything to ifort 12.1's optimizer?

Does the CONTIGUOUS attribute mean anything to ifort 12.1's optimizer?

ifort 12.1 supports the CONTIGUOUS attribute from Fortran 2008, but seemingly not in the way one would expect. Consider this example program:

program test
    double precision, allocatable :: a(:, :, :, :)

    integer n
    read *, n

    allocate(a(n, n, n, n))

    call f(a)

    call f_opt(a)
    contains
    subroutine f(a)
        double precision, contiguous, intent(out) :: a(:, :, :, :)
        a = 1
    end subroutine
    subroutine f_opt(a)
        double precision, contiguous, intent(out) :: a(:, :, :, :)
        call f_opt_impl(a(1, 1, 1, 1), size(a))
    end subroutine
end program
subroutine f_opt_impl(a, n)
    double precision, intent(out) :: a(*)

    integer, intent(in) :: n
    a(1 : n) = 1
end subroutine
Taking advantage of the hints provided by the CONTIGUOUS attribute, supposedly the compiler should collapse the implicit quadruply-nested copy loop in subroutine f and produce code similar to f_opt with f_opt_impl inlined. Instead, ifort generates some incredibly convoluted code that seemingly implements the four loops "as is". In comparison, crayftn 7.4 truly understands what CONTIGUOUS means and flattens the loop nest (well, crayftn actually does more---it knows that the argument is contiguous through IPA even without the CONTIGUOUS attribute; it also knows that the four loop bounds are all simply n and uses an optimized library routine to do the memset).

The above example may be rather contrived asloop collapsing and pattern matching are themselves complicated (but nevertheless doable and worth-doing) issues. But the following is something that I actually use in a project:

subroutine gemv(a, x, y)
    double precision, intent(in) :: a(5, 5)

    double precision, contiguous, intent(in) :: x(:)

    double precision, contiguous, intent(inout) :: y(:)
    y(1 : 5) = y(1 : 5) + a(:, 1) * x(1) + a(:, 2) * x(2) + a(:, 3) * x(3)&

            + a(:, 4) * x(4) + a(:, 5) * x(5)
end subroutine
While this is nothing butDGEMV, invoking MKL on tens of thousands of input instances incurs too much overhead. The CONTIGUOUS attributes are inserted exactly to inform the compiler that there is nothing to multipath for---the stride is 8 bytes. ifort ignores them and produces two versions regardless. (crayftn gets this right, of course.)

Can any Intel compiler dev confirm whether ifort 12.1's CONTIGUOUS support is merely syntactical? If so, can proper semantical support be expected of ifort 13 (or 14 if 13 is not that lucky a number)?

1 post / novo 0
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.