I ran a (very) simple cilk_for loop on a CoreI5-2400 CPU under windows XP-32bit.
The code is attached. It was compiled and built with the latest intel compiler using MSDEV 2010
It seems that this loop runs a little bit faster than this loop implemented with intrinsic C.
But my CPU has 4 cores.
I expect the cilk code to run 4 times faster.
How can I cause all cores to participate in the calculation ?