Intel® Cilk™ Plus

Cilk portability to non Intel processors


I'm investigating the use of Cilk in embedded computer environment.

I've understood that Cilk is only supported on Intel processors for Linux and MacOS X. Is it right?

Cilk also requires some specific compilers. Such compilers are available from Intel, GCC and LLVM.

What should be the effort to port Cilk to a different processor (i.e. ARM or Sparc)?

I assume that it involves some non trivial adaptation of the compiler backend?

Regards, Dominique T

Potential conflict between omp threads and cilk+ workers?

Dear everyone, 

The main part of my application is using the work stealing approach provided by cilk+ or TBB. However, some blas1 level routines which I have no time to implement one by one, I chose to use MKL. That leads me to a potential dilemma, because I know that MKL employs omp threads, whereas the cilk+ or TBB have their own threading library. Do I find myself stuck in a trap caused by potential confilct between omp threads and cilk+ workers? By conflict I mean the risks like oversubscription which will do harm to overall performance.  

libzca 195 and the latest version of Pin (62141)


I tried installing libzca 195 (from with the latest version of Pin (62141 from  I managed to get things to compile and run, but I had to make the following changes to the Makefiles in zca/src and cilkprof.

Detecting a steal without holder (and why are hypermap lookups expensive?)

I would like to be able to detect when a worker has stolen work without the use of a holder. Ideally I would like to to add a "hook" into the runtime system so that a function I define is called upon a successful steal, or have access to a worker-local "successful steal" counter that is incremented each time a worker executes a successful steal. Are either of these things possible using documented or undocumented features of the cilk runtime system in gcc or icc?

simple 'for' runs much faster than 'cilk_for'


I compiled the following code with Intel compiler 11 + Optimizer. Then In ran the code under Windows XP-32.

For some reason, the code without CILK runs much faster than the same code with CILKdefined.

Also, the no-cilk code runs with almost the same performace like a code written with intrinsic C.

The PC on chich the program runs is Code-I5 with 4 cores. Cilk can help divide the processing the many cores.

Can explain this ?


Parallel Implementation of preconditioned conjugate gradient and cholesky decomposition using Cilk Plus

 Hi All,

I want the source code of parallel preconditioned conjugate gradient and cholesky decomposition using Cilk. So if anybody has these implementations source code kindly share them. I need them urgently.


Abdul Jabbar. 

Intel® Cilk™ Plus abonnieren