FORALLs not parallelized in WORKSHARE

FORALLs not parallelized in WORKSHARE

Hello,
further to the apparent unparallelized WORKSHAREs,
I tried to compare $OMP DO with DO and $OMP WORKSHARE with FORALL
on the same task. The code is in the attachment.

If I compile the program with

[hajek@dell8 scratch]$ ifort -O3 -openmp -openmp-report=2 omptest.f90

it outputs

omptest.f90(28) : (col. 8) remark: OpenMP DEFINED LOOP WAS PARALLELIZED.
omptest.f90(17) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.

and the running time is
[hajek@dell8 scratch]$ time ./a.out

real 0m4.280s
user 0m16.618s
sys 0m0.399s

(4 threads are involved)

on the other hand, if I enable using FORALL instead of DO with
[hajek@dell8 scratch]$ ifort -O3 -openmp -openmp-report=2 -D USE_FORALL omptest.f90

the output is

omptest.f90(22) : (col. 8) remark: OpenMP multithreaded code generation for SINGLE was successful.
omptest.f90(17) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.

ifort is coming up with an artificial SINGLE directive - probably implementing WORKSHARE as SINGLE (which is actually allowed by the standard, but is a very poor job.)

and the run time (still with OMP_NUM_THREADS=4) is vastly longer:
[hajek@dell8 scratch]$ time ./a.out

real 0m14.042s
user 0m55.081s
sys 0m0.576s

either I am missing something important here, or ifort in version
"Intel Fortran Itanium Compiler for Itanium-based applications
Version 9.0 Build 20050624 Package ID: l_fc_c_9.0.024"
is really unable to parallelize FORALL in WORKSHARE.

1 post / 0 new
For more complete information about compiler optimizations, see our Optimization Notice.