! area_in_need() spinlocking and hogging 100% CPU

! area_in_need() spinlocking and hogging 100% CPU

I have a process running 100% on 24 cores! Dumping the process reveals 23 TBB threads are spinlocking within area_in_need!

arena* arena_in_need () { spin_mutex::scoped_lock lock(my_arenas_list_mutex); return arena_in_need(my_arenas, my_next_arena); }

Typical callstack:

1 tbb.dll!__TBB_machine_cmpswp1()2 tbb.dll!tbb::internal::market::arena_in_need()3 tbb.dll!tbb::internal::market::process(rml::job & j={...})4 tbb.dll!tbb::internal::rml::private_worker::run()5 tbb.dll!tbb::internal::rml::private_worker::thread_routine(void * arg=0x000000001f1be7d0)6 tbb.dll!_callthreadstartex()7 tbb.dll!_threadstartex(void * ptd=0x0000000000000000)

No other thread is running tbb code or tasks..It might be useful to specify we use task priorities through the optional context parameter. any ides as to why it does this and suggestions on how to fix it?NOTE: we are using TBB 4.0 U2 (OSS278 of last december)

6 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Yes, it can be related to task priorities..Could you provide us a reproducer or sketch the structure of your code here?Do you use task:enqueue (with priorities)?When you say 23 threads are busy doing arena_in_need(), what 24th thread is doing? Is it a master thread? How it submitted the work to TBB?

I tried a few hours to create a test case to reproduce this problem without success.. We are running a mix of parallel_for_each, parallel_for and parallel_sort and pipelines with either high priority set on the task_group_context or in normal mode with default context.As for the 24th thread.. I might be mistaken but does TBB creates only P-1 threads where P is the number of logical processors? As I mentionned in my original post, there is no code running tbb algorithms at the time of the coredump.Looking at the code... could it be that the list of arenas get very long and iterating through them in area_in_need could take longer than expected while holding the spin mutex? I currently only have a minidump of the process that had the problem so I can't see if it's the case.

Is it possible that too many thread fighting for that mutex would make them spinlock uselessly?

I have a full user dump of a process showing the issue.. I could certainly execute some commands in windbg and send you the result if it could help diagnose the issue.Many thanks

Laisser un commentaire

Veuillez ouvrir une session pour ajouter un commentaire. Pas encore membre ? Rejoignez-nous dès aujourd’hui