Elusive Algorithms - Parallel Scan

By James G Dempsey, Published: 05/22/2015, Last Updated: 05/22/2015

Last month there was a query on the Intel® Developer Zone MIC forum "how to perform inclusive scan in C cilk" in which my initial reply was:

Parallelizing this is problematic due to the next result being dependent upon the prior result. While this is not impossible, it is rather difficult and it introduces some redundant additions.

As any well seasoned programmer knows, the term "can be" is seldom used and "problematic" means you have to do a little more work to attain your goal. The inclusive scan is a loop with a single statement with a temporal dependency:

    float sum = 0;
    for ( int i = 0; i < N; i++ )
    {
        Res[ i ] = (sum += A[ i ]);
    }

After making my "problematic" statement I though it be only sporting of me to demonstrate the extent of problematic this was, such that you can ascertain if the benefits from attacking your problematic issues exceed work necessary to overcome your issues. I think you will be pleasantly surprised by following this link and reading the attached PDF.

Jim Dempsey

Product and Performance Information

1

Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804