Parallel Building Blocks - Vectorized Parallel Patterns inside a Cilk ‘for’

100 blogs and 100 videos in 100 days about PBB

#6: Vectorized Parallel Patterns inside a Cilk ‘for’

“Educate the compiler a little on what you’re trying to do, and it will vectorize a ton for you.”

In the previous blog, I explained the syntax and behavior of basic Array Notations vectorization for use within a Cilk ‘for’ loop. In many situations you can use the Cilk ‘for’ spread the Array Notations “mini vectorized kernels” across multiple cores and subsequently, many vector units inside the core. Today I’m going to go over some more of these mini-kernels, which we affectionately call structured parallel patterns. When I refer to a mini-kernel, I’m talking about commonly used operations that you normally would take a few lines of code to just get to run serial, and far more lines of code to get it vectorized and threaded. You can then use these mini-kernels as building blocks for more complicated parallel patterns (one step at a time, we’ll get to that much later).

Let’s talk about one of the simplest parallel patterns: the scatter and gather. With gathering, you take non-consecutive array elements, and pick each one of them into consecutive locations. Indices (the non-consecutive array elements) that you’re trying to address are stored in an index array. This sounds pretty simple, but it’s can be a big bottleneck in a variety of algorithms. Try coding something like that up serially first of all and see how much time it takes for a million elements – takes awhile doesn’t it?

The syntax to do this with Array Notations is very intuitive and Matlab-like. Note that the syntax is simple AND it’s vectorized behind the scenes for you.

Gather
` c[:] = a[b[:]]; // gather elements of a into c according to index array b`
Example:
` If b[] contains {3,5,7}, then c[:] = a[b[:]] will copy a[3],a[5],a[7] into b[0],b[1],b[2]. The sections a[:],b[:],c[:] must all have the same length. `

OR for scattering you do the opposite of gather.

Scatter
` a[b[:]] = c[:]; // scatter elements of c into a, according to index array b `
Scatter is the opposite of gather.
` a[b[:]] = c[:] will store c[0],c[1],c[2] into a[3],a[5],a[7] if b = {3,5,7}.`

If we can do these quick, abstract operations for scatters and gathers, what about conditionals? The syntax is also intuitive. Expressing conditionals is difficult to do with intrinsics, even worse for assembly.

The "If statement” creates a masked vector operation
` if (A[:] > 0) { B[:] = 1;}`

“For each” element of A that is >0, set the corresponding element of B to 1.

The compiler will use AND/OR masks to preserve elements of B where A[i] <= 0. The only caveat is that all the array sections inside an vector if statement must have the same length, which, for what you get in exchange for that rigid rule, is pretty worthwhile in my opinion.

For more complete information about compiler optimizations, see our Optimization Notice.

1 comment

Top

thanks.......simplest parallel patterns: the scatter and gather.