I have obtained good speedups using scalable allocator on various occasions in the past, but in producer consumer systems where allocations and deallocations are performed by separate task invokations (and presumably likely different threads) it can be frustrating to use due to the allocator's apparent propensity for retaining freed memory internally (it certainly doesn't technically leak it as subsequent TBB thread allocations can draw on it; however in complex systems where TBB is just used in part of a processing pipeline I often want to reclaim the store for other purposes).
Anyway, I got quite excited when I saw TBB 4.2 seemed to have a scalable_allocation_command(TBBMALLOC_CLEAN_ALL_BUFFERS,0) which looked like it might do something about this.
Attached (tbbmem.cpp) is some minimal test code.
Compiled with g++ 4.8.2 on an amd64 Debian sid using
g++ -std=c++11 -o tbbmem -march=native -O3 -g tbbmem.cpp -ltbb -ltbbmalloc
Pre-allocation: 0.0264 GByte process size
After parallel allocation: 1.87 GByte process size
After parallel deallocation: 1.87 GByte process size
After tbbmalloc clean: 1.87 GByte process size
Now I was hoping this newfangled clean command would somehow magically shrink the process size back down to where it started, but clearly not.
So: what does it actually do? And more importantly is there anything I can do to make the sort of situation which the code attached models release the deallocated core back to where it's available for other purposes?