Creates a league of thread teams to execute the structured block in the master thread of each team. It also specifies a loop that will be distributed across the master threads of the teams region. The loop will be executed concurrently using SIMD instructions.
You can write a custom reducer if none of the supplied reducers satisfies your requirements.
Any of the supplied reducers can be used as models for developing new reducers, although some of these examples are relatively complex. The implementations are found in the reducer_*.h header files in the include/cilk directory of the installation.
The first step is to ensure that the C/C++ serial program has good performance and that normal optimization methods, including compiler optimization, have already been used.
As one simple, and limited, illustration of the importance of serial program optimization, consider the matrix_multiply example, which organizes the loop with the intent of minimizing cache line misses. The resulting code is: