SIMD run time failure using directives, ifort v14.0.0

SIMD run time failure using directives, ifort v14.0.0

imagem de Matthew C.

So I have some code that is from a model where many of the loops appear something like the following where one is doing stride one accesses through dynamically allocatable arrays. Now, despite the arrays being at the end of pointer lists, I know that the arrays do no overlap in memory. Using IVDEP or VECTOR directives will not convince the compiler to vectorize this code (no surprises there). Prior to the  v14 compiler, the compiler would also not vectorize this code despite using !DIR$ SIMD or !$OMP SIMD directives. The v14 compiler, however, does as is evidenced by both the vec report messages and the associated assembly code.

!$OMP PARALLEL PRIVATE(block)

	    

	    block => domain % blocklist

	    do while (associated(block))

	         !$OMP DO SCHEDULE(RUNTIME) PRIVATE(k)

	        do j = 1, block % mesh % nEdges

	            !$OMP SIMD
	            do i = 1, block % mesh % nVertLevels

	                block % state % time_levs(2) % state % a % array(i,j) = &

	                block % mesh % edgeMask % array(i,j) * ( &

	                block % state % time_levs(2) % state % b % array(i,j) + &

	                block % state % time_levs(1) % state % c % array(i,j) )

	            end do
	        end do    

	        !$OMP END DO

	        block => block % next    

	    end do  ! block

	!$OMP END PARALLEL

While the latest compiler that we now have does indeed vectorize the code through the  !DIR$/!$OMP SIMD directive, it fails at run time, either through a seg fault or silently when using OpenMP. Indeed, in the agove loop, I've observed the following behavior:

With OpenMP:

With > 1 thread does not work at run time with !$OMP SIMD or  !DIR$ SIMD.  Fails silently

With 1 thread, seg faults

Without OpenMP: seg faults using !DIR$ SIMD

Would gladly attach the short test code and the assembler output if this forum let me do that.

14 posts / 0 new
Último post
Para obter mais informações sobre otimizações de compiladores, consulte Aviso sobre otimizações.
imagem de jimdempseyatthecove

As TimP states private(i)

This said, is the member ...%array(:,:) an allocatable or pointer. If pointer, would any of the ...%array(:,:) elements overlap amongst threads?

Jim Dempsey

www.quickthreadprogramming.com
imagem de Matthew C.

False. OpenMP do loop counters are private by default, all other variables are shared. Besides, running with a single thread also fails.

imagem de Matthew C.

All arrays are fortran alloctable and don't overlap. Source code follows

Anexos: 

AnexoTamanho
Download foo.f903.08 KB
imagem de Ronald W Green (Intel)

It does look like bad code generation for the simd loop.  I've entered a bug report.  The complex data structures and pointer-based arrays probably tripping it out.  I simplified the testcase, removing the OMP red herrings, and just setting it to a simple 80x80 testcase w/o user input.  My testcase will be attached for reference.

I will keep you posted on progress for this bug report.

ron

Anexos: 

AnexoTamanho
Download u487814.f902.69 KB
imagem de Matthew C.

Thanks Ron

imagem de Matthew C.

One thing else to check, when I have arrays at the end of pointers as above and try to do an array assignment that should be vectorizable, e.g.

a(:) = b(:)

I run into the problem that these aren't vectorized as well. Now, I can put !DIR$ SIMD in front of this I think but right now they are surrounded by OpenMP workshare directives. I don't think you can put another directive inside the WORKSHARE construct and the WORKSHARE directive does not accetp the SIMD directive.

imagem de jimdempseyatthecove

>> False. OpenMP do loop counters are private by default

        !$OMP DO SCHEDULE(RUNTIME) PRIVATE(k)  
        do j = 1, block % mesh % nEdges  
             !$OMP SIMD  
             do i = 1, block % mesh % nVertLevels  
 

In the above code, j defaults to private as it is the loop control variable of the immediately preceding  !$OMP DO, whereas i defaults to shared as it is not the loop control variable of an !$OMP DO loop.

Jim Dempsey

www.quickthreadprogramming.com
imagem de IanH

There's a general "the loop iteration variable of a sequential loop in a parallel or task construct is private in the inner-most construct that encloses the loop" clause in the data sharing rules (in the OpenMP 4.0 spec see in 2.14.1 on p147, line 28.

Which then raises the question why the iteration variable for a do construct is called out separately to be private.

imagem de Tim Prince

I trip up myself over the differing rules for default privatizing of iteration variables (C vs. Fortran vs. Cilk), and whether any lastprivate effect could be obtained (consistent or not with non-OpenMP Fortran definition of value after loop termination).  I think the private clause is needed when default(none) is set, but the compiler should tell you that.

imagem de jimdempseyatthecove

IanH, the spec could be less ambiguous had it said something along the line of "all loop control variables contained within the parallel construct default to private unless specified otherwise", but that is not what it says, nor what I believe is implemented.

Steve, step in here, as this may lead to assumptions contrary to fact.

Jim Dempsey

www.quickthreadprogramming.com
imagem de Ronald W Green (Intel)

a(:) = Some expression with b(:)

vectorizable MAYBE if these are not pointer based.  Pointer based could alias LHS and RHS.  I don't know if it was your code or some other similar code with complex user defined types with pointer-based arrays at the leaf ends of the structures.  The LHS and RHS had totally different variables, different semantics and use, but at the end of these structures were pointers to 2D real arrays.  Logically these would never alias each other (totally different types and usage), still it is POSSIBLE for them to alias the same memory with the leaf-end 2D real array pointers.  Compilers cannot discern intent.  The compiler will (well should) ALWAYS CHOOSE TO CREATE SAFE CODE whenever this is a faint possibility of dependence.

Allocatable arrays tend to allow the compiler to better optimize.  I understand sometimes there are very good reasons to use pointer-based arrays, and understand that for many years types could not have allocatable components.  I get it, I've used pointer-based arrays in my applications over the years (not to mention some questionable use of EQUIVALENCE back in the 80s).  But that is why SIMD directives were introduced.  If you have possible aliasing LHS and RHS but you are certain this will never occur, throw the directive to tell the compiler to forget safety and optimize.  Casual users will get safe code, tuners can take the extra effort to put in appropriate directives to guide the compiler's heuristics.

that said, there are certainly opportunities for any compiler to do a better job at optimization and vectorization.  We do look at every case and have put enormous efforts in vectorization over the past years. 

imagem de Steve Lionel (Intel)

Quote:

jimdempseyatthecove wrote:

Steve, step in here, as this may lead to assumptions contrary to fact.

I have nothing to add here - the others who have commented know OpenMP far better than I do.

Steve
imagem de Ronald W Green (Intel)

This bug is fixed in the latest Composer XE 2013 SP1 Update 2 compiler, posted on Intel Registration Center yesterday, 2/13/2014.

I will close this issue now.  Thank you for reporting this bug.

ron

Faça login para deixar um comentário.