Elusive Algorithms - Parallel Scan

Last month there was a query on the Intel® Developer Zone MIC forum "how to perform inclusive scan in C cilk" in which my initial reply was:

Parallelizing this is problematic due to the next result being dependent upon the prior result. While this is not impossible, it is rather difficult and it introduces some redundant additions.

As any well seasoned programmer knows, the term "can be" is seldom used and "problematic" means you have to do a little more work to attain your goal. The inclusive scan is a loop with a single statement with a temporal dependency:

    float sum = 0;
    for ( int i = 0; i < N; i++ )
        Res[ i ] = (sum += A[ i ]);

After making my "problematic" statement I though it be only sporting of me to demonstrate the extent of problematic this was, such that you can ascertain if the benefits from attacking your problematic issues exceed work necessary to overcome your issues. I think you will be pleasantly surprised by following this link and reading the attached PDF.

Jim Dempsey