Intel® Cilk™ Plus

Potential conflict between omp threads and cilk+ workers?

Dear everyone, 

The main part of my application is using the work stealing approach provided by cilk+ or TBB. However, some blas1 level routines which I have no time to implement one by one, I chose to use MKL. That leads me to a potential dilemma, because I know that MKL employs omp threads, whereas the cilk+ or TBB have their own threading library. Do I find myself stuck in a trap caused by potential confilct between omp threads and cilk+ workers? By conflict I mean the risks like oversubscription which will do harm to overall performance.  

libzca 195 and the latest version of Pin (62141)

Hello,

I tried installing libzca 195 (from https://www.cilkplus.org/sites/default/files/contributions/libzca-src-19...) with the latest version of Pin (62141 from http://software.intel.com/sites/landingpage/pintool/downloads/pin-2.13-6...).  I managed to get things to compile and run, but I had to make the following changes to the Makefiles in zca/src and cilkprof.

Detecting a steal without holder (and why are hypermap lookups expensive?)

I would like to be able to detect when a worker has stolen work without the use of a holder. Ideally I would like to to add a "hook" into the runtime system so that a function I define is called upon a successful steal, or have access to a worker-local "successful steal" counter that is incremented each time a worker executes a successful steal. Are either of these things possible using documented or undocumented features of the cilk runtime system in gcc or icc?

simple 'for' runs much faster than 'cilk_for'

Hello,

I compiled the following code with Intel compiler 11 + Optimizer. Then In ran the code under Windows XP-32.

For some reason, the code without CILK runs much faster than the same code with CILKdefined.

Also, the no-cilk code runs with almost the same performace like a code written with intrinsic C.

The PC on chich the program runs is Code-I5 with 4 cores. Cilk can help divide the processing the many cores.

Can explain this ?

Thanks,

Parallel Implementation of preconditioned conjugate gradient and cholesky decomposition using Cilk Plus

 Hi All,

I want the source code of parallel preconditioned conjugate gradient and cholesky decomposition using Cilk. So if anybody has these implementations source code kindly share them. I need them urgently.

regards,

Abdul Jabbar. 

Interesting reduce (add) case with Array Notation: is it possible ?

I have the following code part already using CILK+ Array Notation constructions:

out[0:Length] +=
(p_in_gains[0]*p_ins[0].blks[0][0:Length] +
p_in_gains[1]*p_ins[1].blks[0][0:Length] +
p_in_gains[2]*p_ins[2].blks[0][0:Length] +
p_in_gains[3]*p_ins[3].blks[0][0:Length] +
p_in_gains[4]*p_ins[4].blks[0][0:Length]);

Intel® Cilk™ Plus abonnieren