Array notation and parallelisation

Array notation and parallelisation

Does the Intel compiler currently attempt to parallelise array notation expressions?  If it does, I am failing dismally in persuading it to do so.  I use CILK_NWORKERS=4, and print both the wall clock and CPU.

If not, what would the recommended alternative be in any case where that were desirable?  To back off to loops and use OpenMP?

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

No, I don't believe it does.  You'll need to use an outer loop to specify the parallelism.  Any of the parallelism frameworks will work; cilk, TBB or OpenMP, to name a few.

    - Barry

Thanks.  Not a problem.   It clarifies the code, anyway, much like Fortran sections.

Georg Zitzlsberger (Intel)'s picture

Hello,

I should add here that the compiler can do it for sufficiently big loops that justify the overhead of threading if "/Qparallel" or "-parallel" (auto-parallelizer) is specified. However, OpenMP runtime will be used underneath. Hence you control it with OMP_NUM_THREADS.

Best regards,

Georg Zitzlsberger

Thanks for the correction, Georg.

    - Barry

As mentioned above, the auto-parallel options invoke OpenMP, so don't mix well with Cilk(tm) Plus.  There might even be a case for the compiler throwing a warning if cilk_for is used with auto-parallelization.

Under OpenMP 4.0, combined vectorization and threaded parallelism for a long countable for loop may be specified by

#pragma omp parallel for simd

which is implemented in icc 14.0 "XE2013 SP1"

With auto-parallel, it probably requires tinkering with par-threshold or writing explicit inner and outer loops.

Thanks very much.  I don't currently have access to icc 14.0, and generally avoid relying on the latest and greatest versions of compilers, because the people I deal with (and their collaborators) don't always control which version they have to use.  But I will split the OpenMP and CilkPlus usages, to minimise confusion.

A warning on auto-parallelization and CilkPlus would be useful, until and unless they work together.

Login to leave a comment.