I assume I found a bug concerning the OpenMP-Implementation of Intel-C++-Compiler version 11, 12 and 13.
It seems that IntelC++ has problems inlining an other function into the loop condition of a for loop that should be OpenMP-parallelized. If that other function also contains a for loop, the IntelC++ seems to parallelize the wrong loop. In the example, the original loop is therefore executed by every thread and the result is too large by a factor of OMP_NUM_THREADS.
A minimalistic test is attached. The test fails with OMP_NUM_THREADS >= 2 and optimization flags -O3 or -O2 or none.
The exact versions tested are (output of "icpc -v"):
- icpc version 13.0.1 (gcc version 4.3.0 compatibility)
- icpc version 12.1.6 (gcc version 4.3.0 compatibility)
- Version 11.1
btw: I couldn't get any of the GNU G++ compiler (versions 4.3.4, 4.3.6, 4.4.7, 4.5.4, 4.6.3, 4.7.2) reproducing the bug. They all work.