I tried to parallelise a function with cilk plus (the function is basicaly a periodical convolution with transposition).
The function has 3 nested "for" loops. Basicaly, in a first implementation I only have changed the "for" to "cilk_for". I tried to change only the first one, or the two first, but without change in performances. The function is "convSerial_cilk", printed at the end of this post. The iteration space can be large (the first for loop iterates from 0 to 20000)
Because I had poor performance, I tried to usethe "cilkview" tools (from the SDK).