I am seeing excessive memory consumption when using the scalable_malloc/scalable_free "C" routines and TBB 4.1 (as part of Parallel Studio) that I do not see when using malloc()/free() or the mkl memory allocation routines.
In a loop, I create and destroy threads that make many calls into scalable_malloc and scalable_free. There are no scalable_ calls "across threads" or from the main thread. These calls are all balanced so no allocated memory is being left dangling.
Each time through the loop memory consumption seems to be increasing as if some thread specific buffers are not being returned when the threads are being destroyed.
MKL has a function MKL_Free_Thread_Buffers that I can call at the end of a thread, just before it dies. Does TBB need a similar call?