How can I parallelize implicit loop ?

How can I parallelize implicit loop ?

I have the loop, inside its body running the function with array member (dependent on loop index) as an argument, and returning one value.
I can parallelized this loop by using cilk_for() operator instead of regular for() - and it is simple and works well.  This is explicit parallelization.  
Instead of explicit loop instruction I can use Array Notation contruction (as shown below) - it is implicit loop.
My routine is relatively long and complecs, and has Array Notation constructions inside, so it cannot be declared as a vector (elemental) one.
When I use implicit loop - it is not parallelized, the run time is increased substantially.
 
float foo(float f_in)
{
 float f_result;
 // LONG computation containing CILK+ Array Notation operations

 /////////////////////////////////////////////////////////
 return f_result;
}

int main()
{
 float af_in[n], af_out[n];

// Explicit parallelized loop
 cilk_for(int i=0; i<n; i++)
  af_out[i] =  foo(af_in[i]);

// Implicit non-parallelized loop
 af_out[:] =  foo(af_in[:]);
}

My question is: does somebody know, if there is the way "to say" to compiler, that my implicit loop (Array Notation assignment) has independent steps and should be parallelized (pragma, something else) ?



 

2 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Have you tried #pragma simd?  Essentially that tells the compiler that the loop should be vectorized, even if the auto vectorization fails. 

   - Barry

Login to leave a comment.