Is scalable_realloc a pool or does it only really grab the small chunk needed?
My applications scales poorly, though each thread is completely independent and work/overhead ratiois significant. It happens to allocate 250K tiny vectors (about 5 pointers in each) in a little under 2 seconds. 16 threads are doing this simultaneously. My test applicationthat performssimilar "work" but does arithmetic in place of all these allocations scales perfectly.
Does scalable_realloc etc grab significant size chunks and dole them out as needed (ala a pool) or all these little allocations between threadspossibly competing via the OS? I am tempted to make my own pool but I loose lots of other tbb benefits.I do not see an appreciable change when switching from standard allocators to the scalable ones. Does that mean my scaling issues are elsewhere?
Thread profiler seems to think all threads are nearly 100 utilized. Could it perhapsbe mis interprets waiting on memory as "work.