Differentiating between a program thread and a cilk thread?

Differentiating between a program thread and a cilk thread?

Is there a way for a function to determine if it is executing on a program thread or on a cilk thread?

In our application is already threaded, but we are investigating using cilk to add greater parallelism.  Our current system uses thread local storage to maintain a struct that contains the thread local portion of our memory manager.  We currently have a function that is called to return the thread local struct.  We'd like to use cilk to parallelize code that uses our memory manager, thus each strand would need access to one of these structs, so I'd like to extend our function to work when called on a strand.  To make this work, I think I need to be able to differentiate between one of our threads and cilk thread. I'm ok with a strand executing on a program thread being treated either way.

I looked at Holders, but I don't think they will solve the problem completely.  In particular, I still need to access the thread local struct from multiple program threads, so I need to know if a thread is a program thread or cilk thread.  It is also not clear what happens when a holder is called from multiple program threads.  I also looked at __cilkrts_get_worker_number, but the behaviour when called from program threads does not seem to be useful in my situation.

Thanks

Darin

9 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Darin,

  Hi, I just wanted to clarify your question, because I'm not sure I completely understand the problem. 
Suppose you have a user code, and one of the threads has a loop of n iterations, and assume each iteration is a serial strand.   Suppose strand i executes the function f(i).  

cilk_for(int i = 0; i < n; ++i) {
     f(i);
}

If each call to f(i) uses your memory manager, what information are you trying to figure out?   If you had 4 user/program threads, each executing their own cilk_for loop as above, what value do you want to figure out in each call to f(i) for each of the 4 cilk_for loops?

During the execution of iteration i, any call to look at thread-local storage will look at the storage for the thread currently executing iteration i.   That thread might be a "system worker" thread (what you call a Cilk thread) that the Cilk Plus runtime created, or it might be one of the original
 "user" threads that the program started executing on (what you are calling a program thread).    The Cilk Plus runtime creates CILK_NWORKERS-1 system worker threads for a program, and they have fixed worker numbers.   User threads I believe all get the same worker number.  But if you want to distinguish between the two, it seems like the worker number should be sufficient, so I'm guessing that's not what you are looking for?

The system workers can steal work from any of the user threads, but each user thread will only steal work from within its own "team", i.e., the work created with the subcomputation for that user thread.  In the example above, each of the 4 user threads is its own separate team, and thus each user thread will work on iterations only from the cilk_for loop that it started.

Do you want to know which team a system worker is currently executing as part of?

Jim

For a thread to use the memory manager, it needs to have exclusive access to a struct that stores the memory management data.  Currently we use thread local storage (pthread_getspecific) to handle this for our threads.  In your example, each user thread will have access to one of these structs via thread local storage.  I need to figure out how to manage the structs for the cilk system threads.  I think that if I can differentiate between a system and user thread, I can make this work.  However there may be a better way using Holders or something else, but I have not been able to figure anything else out.

If all user threads get the same worker number, then I could use that info to get what I want.  However it is not clear from the documentation that this is the case, from http://software.intel.com/sites/products/documentation/hpc/composerxe/en-us/2011Update/cpp/lin/index.htm#cref_cls/common/cilk_bk_using_cilk.htm

If more than one user-created thread calls__cilkrts_get_worker_number, they may get identical results (since worker IDs are not unique across user threads)

To use the worker ID to determine if the calling thread was user or system, I need for the user threads to always get the same worker id (aka calling__cilkrts_get_worker_number from a user thread always returns 0 or something like that).   I also would be a little uncomforatable basing our implementation on undocumented behaviour.

Darin

Darin,

Your use of thread-local storage would seem to be agnostic to whether a thread was created by the user or by the Cilk scheduler.  All you care about is that each thread has its own copy.  For that purpose, ptherad_getspecific() should be sufficient.  The Cilk worker threads will not return the same local variable as the user thread, so everything should be isolated as you would expect.  We call this use "worker local storage" because each worker (including the user thread, which is a special worker) has its own copy.

One caveat with the above is that you need to be careful with your memory manager's assumptions.  Does it correctly handle allocating memory from one thread and releasing it from another?  If not, then you must be careful to ensure that there are no parallelism control constructs (cilk_for, cilk_spawn, or cilk_sync) between an allocation of memory and its corresponding deallocation, since the Cilk runtime can change threads at any of those points.

A holder could be made to work for this purpose, but using it correctly in this case would be more complicated and probably less efficient than just using pthread's TLS.  In the case of a holder, the worker threads would not conflict with each other, but the user threads could, because the user threads all access the same view.  Thus, in the case of a holder, you would still need to disambiguate among user threads and thus distinguish between a user thread and a system worker.

I will refine something that Jim said.  __get_worker_number() will return zero for user-created threads, always.  You are correct that this is not documented behavior, but it is not likely to change.

Finally (or maybe firstly), you might want to reconsider the idea of maintaining your own memory manager.  The TBB scalable memory manager uses the same per-thread allocation principles that you describe and is very efficient for allocating memory in concurrent and parallel programs, including Cilk Plus programs. It handles allocation from one thread with deallocation from another correctly, is well supported and kept up-to-date.  It is also open source.  I encourage you to try the TBB scalable memory manager instead of reinventing the wheel.

I guess I over interpreted the warning on the "General Interaction with OS Threads" page that says not to use thread specific data.   So it is ok to use the thread specific data if you understand that strands can migrate between system threads?  This is true on Windows as well?  I suspect the answer to my next question is no but, is there some way to get user code executed on the system threads when they are created?  I'm thinking about getting how I am going to get the thread specific data initialized.  If not, do system threads ever get destroyed and re-created during normal cilk execution?  I'm thinking that I can use cilk_spawn/cilk_sync mechanism to initialize the thread specific data on system threads before any of my actual code is executed, but that won't work if system threads get shutdown and restarted.

We have a garbage collector (hence why we have our own memory manager), so we rarely free memory explicitly, however it is safe to free memory from threads other than the allocating thread, as that does happen occasionally already.  After I get cilk working with our allocator, I'm also going to investigate getting cllk working with the gc, that will probably lead to a few more questions.

Thanks

Darin

As a technicality, we use the term "strand" to refer to "any sequence of instructions without any parallel control structures."   Thus, an individual strand itself never migrates between different threads.   But execution of a particular function / computation in Cilk Plus may switch threads at strand boundaries.    Both Windows and Unix-flavor Cilk Plus runtimes have similar behavior, so you can usually think of them as the same.

The warning about the use of thread-specific data is there in large part because using thread-specific data in a program often means the program depends on how the underlying scheduler executes work on threads, which should generally be avoided whenever possible.   But when writing a system-level code like a memory manager, it can be difficult to avoid.

There is no documented interface for executing user code when the system threads are created.   I can think of a few hacks to try to do it in user code, (e.g., by creating an initial parallel loop that creates P iterations, where each iteration stalls on a global flag, to make sure that every iteration gets stolen), but none of them are particularly nice.

Currently, the runtime threads are never destroyed unless the user explicitly ends the runtime --- they just go to sleep.

Which version of the compiler and runtime are you using?   In principle, since the runtime sources (for Unix flavors) are open-source, one could also try to add the right hooks into the runtime, at the place where system threads are created.   I haven't really thought about what the interface might be though.

Cheers,

Jim

If you're using Windows, there *is* a documentated way to execute code at the start of any thread.  Every DLL can define a DllMain entrypoint which will be called when the module is loaded or unloaded, or when a thread starts or stops.  See the documentation on DLL_THREAD_ATTACH and DLL_THREAD_DETACH for details.  Be warned that this function will be called by the OS when it has the loader lock.  You need to be very careful about what you do in this context, since you can easily cause a deadlock.  The MSDN documentation has a list of dos and don'ts.

    - Barry

Ok, thanks for all the info, I think I have enough to get started.

We are using:

rdubsharma307:~/sandbox/main/4/mytest/x86_64/test> /opt/intel/bin/icc --version
icc (ICC) 13.0.1 20121010
Copyright (C) 1985-2012 Intel Corporation.  All rights reserved.

Thanks again,

Darin

Declare and initialize a thread local storage variable MemMgrID = -1;
Add a volatile int NextMemMgrID = 0;
Make a get_MemMgrID have

if(MemMgrID < 0) MemMgrID = XCHGADD(&NextMemMgrID, 1);
return MemMgrID;

The only issue with the above is if you terminate a thread and create a new thread the old ID number will not get reused.

The benifit of the above is only the memory manager needs to know about MemMgrID and NexMemMgrID.

Jim Dempsey

www.quickthreadprogramming.com

Leave a Comment

Please sign in to add a comment. Not a member? Join today