Memory consumption using scalable_* and multiple threads

Memory consumption using scalable_* and multiple threads

vasci_'s picture

I am seeing excessive memory consumption when using the scalable_malloc/scalable_free "C" routines and TBB 4.1 (as part of Parallel Studio) that I do not see when using malloc()/free() or the mkl memory allocation routines.

In a loop, I create and destroy threads that make many calls into scalable_malloc and scalable_free. There are no scalable_ calls "across threads" or from the main thread. These calls are all balanced so no allocated  memory is being left dangling.

Each time through the loop memory consumption seems to be increasing as if some thread specific buffers are not being returned when the threads are being destroyed.

MKL has a function MKL_Free_Thread_Buffers that I can call at the end of a thread, just before it dies. Does TBB need a similar call?

7 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Vladimir Polin (Intel)'s picture

Hello,

Intel TBB 4.2 introduced scalable_allocation_command() function to clean-up either thread buffers or all buffers

More details are here  http://software.intel.com/en-us/node/468118

--Vladimir

Alexandr Konovalov (Intel)'s picture

As Vladimir mentioned, there is an call similar to MKL_Free_Thread_Buffers(), but there is no need for it at thread’s termination time, as all per-thread buffers have to be released automatically. Are sequence of allocations is different between iterations of your outer loop (we have to understand is it memory fragmentation or memory leak)? How big is regression in memory consumption in comparison to system allocator?

I’d love to see the reproducer, if the regression is big.

vasci_'s picture

I am seeing many megabytes of extra memory usage when using the scalable allocators.

I can try to come up with a reproducer. But it seems 4.2 addresses this issue.

vasci_'s picture

but there is no need for it at thread’s termination time, as all per-thread buffers have to be released automatically.

How is that possible? How can TBB memory allocators "know" a particular thread has died and that particular thread's buffers can be released? I am using a non TBB threading library (boost::threads) on Windows.

Interestingly , as a side note, I was using OMP threading and this was not an issue. That's because OMP starts up a thread pool and uses the same threads during program execution, so threads are not being repeatedly created and destroyed...

Alexandr Konovalov (Intel)'s picture

How is that possible? How can TBB memory allocators "know" a particular thread has died and that particular thread's buffers can be released?

Under Windows, DllMain is called with with DLL_THREAD_DETACH argument on thread termination for each DLL.

Your observation about OpenMP is important. Interesting that there were no known issues (and so, fixes) related to memory leaks during thread termination.

jimdempseyatthecove's picture

Can you encapsulate your use of boost create thread/exit thread such that is uses a pool?

YourCreateThread :: if(ThreadAvailableInPool) takeFromPool else createThread

YourEndThread :: returnThreadContextToYourPool

Jim Dempsey

www.quickthreadprogramming.com

Login to leave a comment.