Loading...
You are not logged-in Login/Register





  • Posts   Search Threads
  • Steve NuchiaJune 22, 2009 3:42 PM PDT   
    Need help understanding thread pool architecture

    I'm willing to bet this has been answered many time in many forms but I could't find anything that helped me, neither in the documentation nor by searching this forum.

    The TBB docs are written from the perspective of a single-threaded program entering parallelizable sections (possibly nested) and emerging from them again.  There's language about the requirement that each thread entering a TBB parallel construct initializing a task_sceduler_init object but nothing about what effect that has.

    I've got a couple of situations that don't exactly fit the paradigm.  Take the more general one: a library that may be called from a multithreaded program and wants to use TBB internally.  We may be called from a thread with an existing task scheduler but from outside any TBB task, we may be called from inside a tbb task, and we may be called on a thread that's never heard of TBB before.

    Further complicating matters, I'm working in Windows where all threads are not created equal.  There's a faily hideous matrix of things that have per-thread initialization and periodic maintenance obligations.

    I know, use the source, Luke.  What I'm hoping for here isn't so much an insight into the TBB mechanism as the phrase that whacks my head into alignment with the authors' heads.

    Specific issues:

    If two independent user threads call into a module that uses TBB internally, will the tasks created by the called entry points be sceduled against each other?  If so, is there any direct way to influence how they are scheduled?

    If there's any notion of worker thread initialization hooks, I didn't see it.  Should there be?  Is there an idiom for it?

    We're considering implementing a structure where we wrap the tbb::parallel_foo templates with versions that pass their parameters from whatever user thread they were invoked on into a TBB thread pool.  The task trees so created are meant to have arbitrarily overlapping lifetimes and no direct interaction with one another.  What if any gotchas do I need to be looking out for.

    thank you,
    -swn

    pvonkaenelJune 23, 2009 5:41 AM PDT
    Rate
     
    Re: Need help understanding thread pool architecture

    Quoting - Steve Nuchia
    I'm willing to bet this has been answered many time in many forms but I could't find anything that helped me, neither in the documentation nor by searching this forum.

    The TBB docs are written from the perspective of a single-threaded program entering parallelizable sections (possibly nested) and emerging from them again.  There's language about the requirement that each thread entering a TBB parallel construct initializing a task_sceduler_init object but nothing about what effect that has.

    I've got a couple of situations that don't exactly fit the paradigm.  Take the more general one: a library that may be called from a multithreaded program and wants to use TBB internally.  We may be called from a thread with an existing task scheduler but from outside any TBB task, we may be called from inside a tbb task, and we may be called on a thread that's never heard of TBB before.

    Further complicating matters, I'm working in Windows where all threads are not created equal.  There's a faily hideous matrix of things that have per-thread initialization and periodic maintenance obligations.

    I know, use the source, Luke.  What I'm hoping for here isn't so much an insight into the TBB mechanism as the phrase that whacks my head into alignment with the authors' heads.

    Specific issues:

    If two independent user threads call into a module that uses TBB internally, will the tasks created by the called entry points be sceduled against each other?  If so, is there any direct way to influence how they are scheduled?

    If there's any notion of worker thread initialization hooks, I didn't see it.  Should there be?  Is there an idiom for it?

    We're considering implementing a structure where we wrap the tbb::parallel_foo templates with versions that pass their parameters from whatever user thread they were invoked on into a TBB thread pool.  The task trees so created are meant to have arbitrarily overlapping lifetimes and no direct interaction with one another.  What if any gotchas do I need to be looking out for.

    thank you,
    -swn


    I had similar questions about how to use task_scheduler_init in a DLL in this thread: http://software.intel.com/en-us/forums/showthread.php?t=65576.  I ended up creating a task_scheduler_init instance in the DllMain() on process connect and terminate it on process detach.  Then in each DLL function call that uses TBB, create a local task_scheduler_init instance (that will automatically destruct at the end of the call) in-case a background thread is calling it (should be a very cheap call).

    I have no idea how to control the scheduling of tasks dispatched from different threads that may be running concurrently.  Considering a 4 core machine, the first task_scheduler_init will create 3 worker threads.  If the main thread and a background thread each dispatch a block of tasks, then they will fight for the 3 worker threads probably based on who dispatched first, but the main/bg threads will still have their own independent thread priorities.  So, I guess you have a fractional control based on the disptcher thread's priority.

     

    Peter



    Steve NuchiaJune 23, 2009 8:05 AM PDT
    Rate
     
    Re: Need help understanding thread pool architecture


    Thank you, that's a big help.  Now I'm reading up on all the restrictions on what you can do in DllMain and it's pretty terrifying.  Can you point me to an example or pattern that "threads" the needle? (ha ha).

    Alexey Kukanov (Intel)June 23, 2009 11:33 AM PDT
    Rate
     
    Re: Need help understanding thread pool architecture

    Quoting - Steve Nuchia
    Specific issues:
    ...


    Some information related to your questions:

    - I think I explained a few times in the forum how task_scheduler_init works, and that initializing TBB for a second time in a thread has low overhead. Thus the solution Peter suggested is what we recommend.

    - in the next version of TBB, there will be support for automatic initialization. So you will not need to create task_scheduler_init on each call for sake of threads that did not yet initialize TBB explicitly. Still I would recommend to keep a global init object that covers DLL lifetime, to ensure TBB worker threads remain alive.

    - if two independent user threads (we call them "masters") use TBB concurrently, they will share the TBB workers. Whatever master publishes its tasks first, will get the workers; but once a worker completed the piece of work stolen earlier, it will seek for another piece to steal, and the second master will be considered. The masters will most of the time work on their own tasks; but if the task pool becomes empty while stolen pieces of job are not yet completed, a master will also go and steal, possibly from another master. There is no direct way to influence stealing.

    - for hooks, learn task_scheduler_observer.

    - I am not sure what do you want to achieve with the above mentioned wrappers over TBB parallel algorithms. Could you elaborate a little?



    Steve NuchiaJune 23, 2009 12:34 PM PDT
    Rate
     
    Re: Need help understanding thread pool architecture

    - I am not sure what do you want to achieve with the above mentioned wrappers over TBB parallel algorithms. Could you elaborate a little?

    Very helpful post, summarizing what I'd gleaned elsewhere and filling in some gaps.  Thank you!

    In Windows, as is probably true in most GUI frameworks, all threads are not created equal.  What I'm trying to achieve is, generically, segregation of work that a "master" can or must do from work that can or should be done by workers.

    Specifically: the master must contunue to "pump messages" or the world stops working, if the master happens to be the main thread of the application.  Also, the RPC mechanisms underlying COM and its successors work only if you've goine through the proper initialization rituals on the thread making the call.

    Having the master act as foreman, sharing the tasks with the workers creates a lot of constraint and requirement conflicts.  Keeping them separate is one approach to resolving those conflicts.  Others are (using the "hook" concept) ensuring that all workers are qualified to use all the APIs and dynamically detecting whether we're on the master or an ordinary worker thread and somehow "doing the right thing" inside (every!) task's operator() function.

    Isn't legacy programming fun?

    Steve NuchiaJune 23, 2009 12:38 PM PDT
    Rate
     
    Re: Need help understanding thread pool architecture


    Also, I'm still looking for a pattern that will allow code resident in a DLL to maintain a thread pool over its lifetime and safely clean up when the DLL is unloaded, regardless of which mechanism(s) are used by the host process to load and unload the library.  According to Microsoft's own documentation this is intractable in general so I guess my expectations are inherently limited here.

    pvonkaenelJune 23, 2009 12:51 PM PDT
    Rate
     
    Re: Need help understanding thread pool architecture

    Quoting - Steve Nuchia

    Thank you, that's a big help.  Now I'm reading up on all the restrictions on what you can do in DllMain and it's pretty terrifying.  Can you point me to an example or pattern that "threads" the needle? (ha ha).

    I'm using a DllMain that looks like the following.  Note that you can probably skip the ippStaticInit() call unless you're statically linking with the IPP library.

    tbb::task_scheduler_init g_tbbinit(tbb::task_scheduler_init::deferred);
    
    
    BOOL APIENTRY DllMain( HMODULE /*hModule*/,
                           DWORD  ul_reason_for_call,
                           LPVOID /*lpReserved*/ )
    {
        switch (ul_reason_for_call) {
            case DLL_PROCESS_ATTACH:
                ippStaticInit();
                g_tbbinit.initialize();
                break;
            case DLL_THREAD_ATTACH:
            case DLL_THREAD_DETACH:
                break;
            case DLL_PROCESS_DETACH:
                g_tbbinit.terminate();
                break;
        }
        return TRUE;
    }
    


    Alexey Kukanov (Intel)June 23, 2009 1:06 PM PDT
    Rate
     
    Re: Need help understanding thread pool architecture

    Freeing the main application thread to do message pumping etc., and delegating all the heavy work to separate thread(s) that could in turn utilize TBB algorithms or whatever else - this makes perfect sense to me. If you just meant that, I have no further questions :)
    Quoting - Steve Nuchia

    Also, I'm still looking for a pattern that will allow code resident in a DLL to maintain a thread pool over its lifetime and safely clean up when the DLL is unloaded, regardless of which mechanism(s) are used by the host process to load and unload the library.  According to Microsoft's own documentation this is intractable in general so I guess my expectations are inherently limited here.

    Right. And, as Peter's experience with dynamic loading and unloading of TBB-dependent DLL suggests, we have some problems with correct thread shutdown in this scenario. I have heard an opinion (supported by reference to an MS KB article, which I unfortunately lost) that the most safe way to do such cleanup on Windows is to signal worker threads that they should complete the work, release all resources etc, and park themself in e.g. an infinite loop; and after they signal back their completion, just kill them. This is not yet implemented in TBB, though we might eventually get there if nothign else works.

    Steve NuchiaJune 23, 2009 1:30 PM PDT
    Rate
     
    Re: Need help understanding thread pool architecture

     I have heard an opinion (supported by reference to an MS KB article, which I unfortunately lost) that the most safe way to do such cleanup on Windows is to signal worker threads that they should complete the work, release all resources etc, and park themself in e.g. an infinite loop; and after they signal back their completion, just kill them. This is not yet implemented in TBB, though we might eventually get there if nothign else works.

    The document that lays that out can be downloaded from http://www.microsoft.com/whdc/driver/kernel/DLL_bestprac.mspx
    The relevant section is on page 7.  Well, its pretty much all relevant in the piecemeal Microsoft documentation tradition, but page seven is the part you've lost track of.

    pvonkaenelJune 23, 2009 1:31 PM PDT
    Rate
     
    Re: Need help understanding thread pool architecture

    Quoting - Steve Nuchia

    Also, I'm still looking for a pattern that will allow code resident in a DLL to maintain a thread pool over its lifetime and safely clean up when the DLL is unloaded, regardless of which mechanism(s) are used by the host process to load and unload the library.  According to Microsoft's own documentation this is intractable in general so I guess my expectations are inherently limited here.

    If you want to dynamically LoadLibrary()/FreeLibrary() on the DLL that uses TBB, I bumped into a dead-local case which I was able to fix by modifying the Arena::terminate_workers() method in tasks.cpp (commercial aligned open source version of TBB).  Look for the call to WaitForSingleObject() and replace INFINITE with some timeout (I use 300 ms) and look for the timeout case.

                DWORD status = WaitForSingleObject( w.thread_handle, 300 );
                if( status==WAIT_FAILED ) {
                    fprintf(stderr,"Arena::terminate_workers: WaitForSingleObject failedn");
                    exit(1);
                } else if ( WAIT_TIMEOUT == status ) {
                    TerminateThread(w.thread_handle, -1);
                }
    


    Steve NuchiaJune 23, 2009 2:12 PM PDT
    Rate
     
    Re: Need help understanding thread pool architecture

    Quoting - pvonkaenel
    If you want to dynamically LoadLibrary()/FreeLibrary() on the DLL that uses TBB, I bumped into a dead-local case

    Thank you, that's very helpful too.  So far I'm using the precompiled binaries but if I have to I'll build from source and incorporate your suggested workaround.

    It's not that I "want to" dynamically load/unload anything.  I'm shipping (among other things) a library that may be called from other libraries that may be loaded dynamically.  It's out of my hands.

    Where I ran into the deadlock was with a call to the registerserver entry point leading to destruction of an initialized TBB pool from DllMain.  I could work around that particular case but it seems to be the tip of an iceberg.

Forum jump:  

Intel Software Network Forums Statistics

16,376 users have contributed to 46,363 threads and 164,030 posts to date.

In the past 24 hours, we have 11 new thread(s) 28 new posts(s), and 25 new user(s).

In the past 3 days, the most popular thread for everyone has been Program compiles in release but not debug The most posts were made to You need to show us the whole The post with the most views is try_pop in concurrent_queue

Please welcome our newest member fruitbrown


For more complete information about compiler optimizations, see our Optimization Notice.