Does the Intel compiler currently attempt to parallelise array notation expressions? If it does, I am failing dismally in persuading it to do so. I use CILK_NWORKERS=4, and print both the wall clock and CPU.
If not, what would the recommended alternative be in any case where that were desirable? To back off to loops and use OpenMP?