I have this application which according to Intel advisor should get about 3.5 of speedup using 4 threads.
Using OpenMP I was able to get only ~ 2.35 with 2 and 2.85 with 4 threads.
Now I am learning and applying Cilk to see if I can improve this performance. My compiler is icc (ICC) 14.0.0 20130728.
I am using only cilk_for (similar to what I did with openMP). Running the application with one processor I get the same performance with Cilk and openMP. However adding more processors hurts the Cilk performance big time.