[Bug] OpenMP-Bug in IntelC++ 13/12/11 concerning a function in the loop condition

[Bug] OpenMP-Bug in IntelC++ 13/12/11 concerning a function in the loop condition

Dear all,

I assume I found a bug concerning the OpenMP-Implementation of Intel-C++-Compiler version 11, 12 and 13.

It seems that IntelC++ has problems inlining an other function into the loop condition of a for loop that should be OpenMP-parallelized. If that other function also contains a for loop, the IntelC++ seems to parallelize the wrong loop. In the example, the original loop is therefore executed by every thread and the result is too large by a factor of OMP_NUM_THREADS.

A minimalistic test is attached. The test fails with OMP_NUM_THREADS >= 2 and optimization flags -O3 or -O2 or none.

The exact versions tested are (output of "icpc -v"):

  • icpc version 13.0.1 (gcc version 4.3.0 compatibility)
  • icpc version 12.1.6 (gcc version 4.3.0 compatibility)
  • Version 11.1

Best,

Michael Walz

btw: I couldn't get any of the GNU G++ compiler (versions 4.3.4, 4.3.6, 4.4.7, 4.5.4, 4.6.3, 4.7.2) reproducing the bug. They all work.

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
jimdempseyatthecove's picture

Your code violates OpenMP requirements See Section 2.5.1 Loop Construct.

for (init-expr; test-expr; incr-expr) structured-block
init-expr One of the following:
var = lb
integer-type var = lb
random-access-iterator-type var = lb
pointer-type var = lb
test-expr One of the following:
var relational-op b
b relational-op var
incr-expr One of the following:
++var
var++
--var
var--
var += incr
var -= incr
var = var + incr
var = incr + var
var = var - incr
var One of the following:
A variable of a signed or unsigned integer type.
For C++, a variable of a random access iterator type.
For C, a variable of a pointer type.
If this variable would otherwise be shared, it is implicitly made
private in the loop construct. This variable must not be
modified during the execution of the for-loop other than in
incr-expr. Unless the variable is specified lastprivate on
the loop construct, its value after the loop is unspecified.
relational-op One of the following:
<
<=
>
>=
lb and b Loop invariant expressions of a type compatible with the type
of var.
incr A loop invariant integer expression.

 Your code is varying the test-expr.

Jim Dempsey

www.quickthreadprogramming.com

Quote:

jimdempseyatthecove wrote:

Your code violates OpenMP requirements See Section 2.5.1 Loop Construct.

[...]

 Your code is varying the test-expr.

Jim Dempsey

I like to disagree. test-expr is of the form "var relational-op b" where "b = getNumber(x)" is a "Loop invariant expression" as stated in the requirements you cited. Please note that the return value of the function "getNumber" is invariant during the loop. It is basically a complicated way to simply return its argument. Also the loop variable "i" is not changed other than in incr-expr. Please note that the two loops have different counter variables (although both are named "i").

Best,

Michael Walz

jimdempseyatthecove's picture

When optimization is off, getNumber(x) return value is unknown. While it may be a true assumption (expectation) that optimizations would reduce the code such that the return value in invariant, it should be observed that this portion of the compiler optimization phase has been moved into the IPO (Inter-Procedural Optimization), which in turn affects code at link time (XLINK time). While the OpenMP constructs are produced at pre-XLINK time. IOW at the time of the OpenMP evaluation of test-expr the return value of getNumber(x) is unknown (not known to be invariant).

Jim Dempsey

www.quickthreadprogramming.com

Icpc tends not to optimize even much simpler expressions in the conditional, so it's valuable to perform the assignment prior to the parallel region, making a simple shared variable, rather than implying all the threads should evaluate it.

@Jim: You are right about your analysis that "the return value of getNumber(x) is unknown (not known to be invariant)", but I think that is not the point. In my opinion, the specification does not require that it is known to be invariant, just that it is invariant.

By the way, the bug vanishes if I move the implementation of getNumber(x) to a different source file, compile them separately and then link them. That way, the result of getNumber(x) is for sure unknown in the other source file at compile-time but this does not trigger the bug.

Also, adding either -fno-inline-functions or -O1 or -O0 when compiling the posted version removes the bug. That is why I thought that the bug is due to inlining the function inside test-expr. In addition, when investigating the produced assember code (-S), icpc seemed to parallelize the wrong loop (the one inside getNumber(x)).

@Tim: Yes, this avoids the bug, of course. But although it's a valid optimization for the posted code (though we are talking about a few cpu cycles), I still believe the posted code is valid. Therefore, icpc should not fail to compile it correctly.

Best,

Michael Walz

jimdempseyatthecove's picture

>> In my opinion, the specification does not require that it is known to be invariant, just that it is.

This would be correct. Mea culpa.
This is a bug.

>>, when investigating the produced assember code (-S), icpc seemed to parallelize the wrong loop (the one inside getNumber(x)).

Not necesarily so. getNumber(x) would have to be called once outside the loop at #pragma parallel for location in the serial code such that the loop partitioning can be determined. Subsiquent to this it should be called a second time prior to loop execution has been reach (i.e. stop before start). IOW you should see it both outside and inside the loop. The compiler may assume that if the partitioning creates a partition that the partition will run at least one pass and can thus place the termination condition test at the bottom of the loop. Should the getNumber(x) in the test-expr vary (its not) then this would produce a case where the code executes possibly more passes than expected.

I would think that an additional requirement of a function in test-expr not only be invariant, but also be pure (no side-effects). Though this is not in the specification.

Jim Dempsey

www.quickthreadprogramming.com

Login to leave a comment.