What is tbbmalloc_proxy?

What is tbbmalloc_proxy?

I noticed that in the latest build for TBB, there is a new library called tbbmalloc_proxy, and off the top of my head, I couldn't imagine what's it purpose is.

And after spending 3 minutes looking I couldn't find any documentation about the new module, and although I guess it's documented somewhere, it's not dead easy to find, so I figure posting it here might help others when they notice it, and google it or come to the forums.

So what is it? What benefit does it have, when should I use it?

Thanks!

6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

The tbbmalloc_proxy binaries can be used to replace memory allocation routines with the TBB scalable allocator across the whole process.
On Linux, one should use LD_PRELOAD to load the library into the process before any other shared module. Then it will redirect every malloc/free etc. call to the corresponding TBB scalable_* functions.
On Windows, the effectisthe same but the usage model is different; basically thetbbmalloc_proxy.lib import library should be linked into the main application module.See /installinclude/tbb/tbbmalloc_proxy.h for instructions.
This functionality is not yet supported forother operating systems.

Added: I fixed the misprintin the path above.

Quoting - Alexey Kukanov (Intel)

The tbbmalloc_proxy binaries can be used to replace memory allocation routines with the TBB scalable allocator across the whole process.
On Linux, one should use LD_PRELOAD to load the library into the process before any other shared module. Then it will redirect every malloc/free etc. call to the corresponding TBB scalable_* functions.
On Windows, the effectisthe same but the usage model is different; basically thetbbmalloc_proxy.lib import library should be linked into the main application module.See /install/tbb/tbbmalloc_proxy.h for instructions.
This functionality is not yet supported forother operating systems.

Thanks! By the way that's a great feature, I'd kind of been hoping for something like this. Now the allocator should be as easy to use as Hoard or it's like.

Quoting - Alexey Kukanov (Intel)

The tbbmalloc_proxy binaries can be used to replace memory allocation routines with the TBB scalable allocator across the whole process.
On Linux, one should use LD_PRELOAD to load the library into the process before any other shared module. Then it will redirect every malloc/free etc. call to the corresponding TBB scalable_* functions.
On Windows, the effectisthe same but the usage model is different; basically thetbbmalloc_proxy.lib import library should be linked into the main application module.See /installinclude/tbb/tbbmalloc_proxy.h for instructions.
This functionality is not yet supported forother operating systems.

Added: I fixed the misprintin the path above.

I am trying to use the new LD_PRELOAD feature of TBB under Linux and I am trying to figure something out. I am using the TBB 2.1 update 3 Commercial Aligned Release (tbb21_015oss). I am only trying to replace the standard memory allocator with the TBB scalable allocator. I am setting LD_PRELOAD to ${TBB_PATH}/lib/libtbbmalloc_proxy.so.2:${TBB_PATH}/lib/libtbbmalloc.so.2 and I believe that I am getting the scalable allocator. However I noticed that I get significantly better performance if I create an instance of tbb::task_scheduler_init than when I do not. What else is going on here? Tracing through the code I find that initializing TBB does eventually call initialize_cache_aligned_allocator(). Does that give me a different version of the scalable allocator? Is there a way to get that without creating an instance of tbb::task_scheduler_init and including the libtbb.so?

Quoting - mcmark64
I am trying to use the new LD_PRELOAD feature of TBB under Linux and I am trying to figure something out. I am using the TBB 2.1 update 3 Commercial Aligned Release (tbb21_015oss). I am only trying to replace the standard memory allocator with the TBB scalable allocator. I am setting LD_PRELOAD to ${TBB_PATH}/lib/libtbbmalloc_proxy.so.2:${TBB_PATH}/lib/libtbbmalloc.so.2 and I believe that I am getting the scalable allocator. However I noticed that I get significantly better performance if I create an instance of tbb::task_scheduler_init than when I do not. What else is going on here? Tracing through the code I find that initializing TBB does eventually call initialize_cache_aligned_allocator(). Does that give me a different version of the scalable allocator? Is there a way to get that without creating an instance of tbb::task_scheduler_init and including the libtbb.so?

Frankly I have no idea why just initializing TBB might improve performance; and knowing nothing about the application it is hard to make guesses. TBB initialization won't give you any different version of the allocator; and it won't give you any malloc replacement.

Added: tbb::cache_aligned_allocator and tbb::tbb_allocator are C++ classes that can be used instead of std::allocator, and provide some additional properties as described in the documentation. Internally, both use scalable_malloc & co if available, and regular malloc otherwise; the initialization function you mentioned performs the selection. Since you preload libtbbmalloc.so.2, scalable_malloc is available to use from TBB.

Also: by default, TBB packages do not have lib directory, so I assume you somehow ensured that ${TBB_PATH}/lib/libtbbmalloc_proxy.so.2 really points to the desired binary.

Now the same functionality is available on windows also in the 20090511 development release.

Leave a Comment

Please sign in to add a comment. Not a member? Join today