Problems using Cilk and TBB in same function

Problems using Cilk and TBB in same function

Hello all,

My code uses Cilk & TBB in the same recursive functions, so that I can test them both with maximal code-sharing.
But when I use the TBB version with more than 3 threads, I get a crash at runtime with this error:
"Too many threads attempting to use Cilk".

This happens with the latest TBB release (TBB 3.0 Update 2 commercial-aligned release), as well as the one included with Parallel Studio XE.

This happens even if I set CILK_NWORKERS large or previously use the runtime API to set the number of workers large. (this is on a Core 2 Duo by the way, but I also tested a 4-way machine)

When I comment out the code containing the Cilk_spawn and Cilk_sync, then I can use lots of threads with TBB (e.g. 13).

So it appears that there is some interference between TBB & Cilk? This happens even when no calls to the Cilk APIs are made.

The TBB calls I use are currently task_group & mutexes, but I think this occurs with parallel_invoke & parallel_for as well.

BTW, this occurred in the Parallel Composer (non-XE) Beta a couple months ago, but I didn't get around to reporting it.

thanks,
Daniel Faken

4 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Let's start by saying that this was a problem with early versions of the Intel Cilk Plus runtime. We've worked with the TBB folks to fix it for the final Parallel Studio release. More on that later.

This is going to require a bit of explanation of how the Cilk runtime works.

Internally, the Cilk runtime has an array which contains entries for all of the possible workers. This is a fixed size, simple array, to make indexing into it during a steal attempt fast and easy, and to avoid races. We'll go over how that fixed size is chosen in a bit.

The Cilk runtime thinks in terms of three kinds of workers:

  • The first is a "User Worker". This is your thread which makes Cilk calls. User workers are special in that they can never steal. This is so that a sync is never waiting for the completion of work being done for another user worker. Your thread "binds" to the Cilk runtime when it enters the first function that uses a cilk_spawn, and "unbinds" in the epilogue of that function. When the thread binds to the runtime, it is assigned a slot in the array of workers, and when it unbinds, the thread gives up that slot.
  • The second type of worker is a "System Worker". These are the threads that the runtime creates. You can control how many of those there are using __cilkrts_set_param("nworkers", "n"), or the CILK_NWORKERS environment variable. System workers sit in a loop randomly picking a worker and attempting to steal work from it.
  • There's also a third type of worker, and that's a "Free Worker". This is a slot in the array of workers that is available for assignment as a user worker, but not yet taken.

The Cilk runtime creates one pool of system worker threads. These are assigned the first N slots in the worker array. User workers bind onto later slots.

By default, the current version of the Cilk runtime will allow 2 * NWORKERS slots for user workers. This should be sufficient for use with TBB. However, early versions of the Cilk runtime hard coded that to 3. Which we knew was a bad value, but hadn't come up with a better default value yet.

However, this wasn't sufficient for mixed Cilk/TBB code. Calling from TBB into Cilk should work, but calling from Cilk into TBB would get messed up since TBB stores data in thread local storage. We did some work with the TBB folks a month or two ago to resolve this. Which is when the default for the number of user worker slots was changed.

It's also possible to override the default number of user worker slots, but it's currently undocumented and I'd prefer not to document it here if you don't need it.

- Barry

Thank you for the detailed answer.

I'm gathering the temporary solution will be to separate the Cilk code from the TBB, so I'll try that.

Daniel

I'd assumed that since you hadn't reported it, it wasn't important to you.

As I said, there is a way to increase the number of slots allocated for worker threads. If you need it, I can tell you how.

But it is fixed in the upcoming release.

- Barry

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi