Scalable Memory Pools: community preview feature

By Anton Malakhov,

Published:12/19/2011   Last Updated:12/19/2011

In TBB 4.0, we introduced new community preview feature (CPF) – the scalable memory pools. See the TBB Reference Manual (D.4) for formal and detailed description. In this blog, we will present them less formally and discuss what changes can be made.


We had vague requests from customers to implement a memory pool (Wikipedia calls it region) or some of its properties in the TBB scalable memory allocator. We summarized these requests and general information on memory pools from the Internet and got the following compilation of major properties and abilities:

  • Memory poo ls basically do the same job as standard memory allocators but additionally group memory objects under umbrella of a specific pool instance which enables:
    • fast deallocation of all the memory at once on pool destruction or for sake of further reuse
    • less memory fragmentation and related synchronization between independent groups
  • Memory pools allow more control over acquisition and release of memory resources, and may have user-specific sources of memory:
    • redirection to a specific memory provider, e.g. standard or custom implementation of malloc, big memory pages, memory tied to specific NU MA node, IPC shmem regions.
    • memory chunk/buffer of a fixed size

To squeeze more performance and to fight memory fragmentation, some specific implementations allocate objects of fixed size only (so called object pools, e.g. boost::pool, Wikipedia calls it memory pool) or are unable to deallocate individual object ("arena allocator"). In our implementation, we tried to provide more general functionality in thread-safe and scalable way. For that purpose, the implementation of the memory pools is based on TBB scalable memory allocator and so has similar speed and memory consumption properties. Later we may address more specific use cases, based on the feedback.


Our memory pools API consists of two classes for thread-safe memory management: tbb::fixed_pool and tbb::memory_pool. The first one is for the simple case when an already allocated memory block and is used for allocation of smaller objects. And the second one utilizes a user-specified memory provider to obtain big chunks of memory where smaller objects reside. As opposed to fixed_pool, memory_pool is able to grow on demand and relinquish unused chunks back to the provider.

Both classes provide familiar methods for allocation and deallocation:

void *ptr = my_pool.malloc( (size_t) 10 ); // allocate 10 bytes
ptr = my_pool.realloc( ptr, (size_t) 12 ); // extend the allocation to 12 bytes ptr ); // deallocate it

Additionally, there is a method which deallocates all the memory at once, i.e. it is a faster equivalent to a series of calls to for each pointer obtained in this pool by previous calls to my_pool.malloc():

my_pool.recycle(); // Frees all the memory in the pool for reuse

Please note, that it is not thread-safe to call it concurrently to other methods on the same instance (similarly to clear() method in containers).
We also provide an (almost, except absence of default constructor) STL-compliant allocator class to enable pools inside STL containers:

typedef tbb::memory_pool_allocator<int> pool_allocator_t;
std::list<int, pool_allocator_t> my_list( (pool_allocator_t( my_pool )) );

Now, the only thing that holds us back from the first experiment with this new feature of TBB is the question – how to create the ‘my_pool’.  First, we need to enable this feature and include the header:

#include “tbb/memory_pool.h”

If you want to create a memory pool on top of your memory block, let’s specify its address and size in bytes to the constructor of tbb::fixed_pool class, as in following excerpt:

char buffer[1024*1024];
// The casts below are just to show the types of arguments.
tbb::fixed_pool my_pool( (void*)buffer, (size_t)1024*1024*sizeof(char) );

The maximal amount of memory which can be allocated from the pool declared above is limited by size of the buffer minus some space for control structures. And if you want to avoid this limitation, let’s use tbb::memory_pool template class specifying memory provider (which will be discussed later) as its template argument:

tbb::memory_pool< std::allocator<char> > my_pool(/*optionally: allocator instance*/);

You can specify any STL-compatible allocator as the memory provider (though this is a subject to change). It will provide (big) memory chunks for  my_pool when necessary. The destructor of the memory_pool class implies release of all the memory chunks back to the memory provider.

Let’s consolidate our knowledge in one artificial example:

// Link this with tbbmalloc library
#include "tbb/memory_pool.h"
#include <list>
#include <stdio.h>

void main() {
    static char buf[1024*1024*4]; // buffer for interim data
    tbb::fixed_pool interim_pool(buf, sizeof(buf)); // pool for temporary objects
    tbb::memory_pool< std::allocator<char> > result_pool; // pool to store the results

    typedef tbb::memory_pool_allocator<int> result_allocator_t; // interface to STL containers
    std::list<int, result_allocator_t> result_list( (result_allocator_t( result_pool )) );

    for(int result = 0, i = 0; i < 100; i++, result = 0) {
        for(int j = 0; j < 1000000; j++) {
            int *p = (int*)interim_pool.malloc(4);
            if( p ) result++; // really dummy :)
        // in real application, here can be some processing of allocated objects
        result_list.push_back(result); // no memory fragmentation here - separate pool
        interim_pool.recycle(); // free all the interim objects
        printf(\"%d\n\", result); // should be the same number on each iteration
} // all the memory is released back implicitly

The simple part is done, and I hope that you are interested enough to proceed with more complex questions, and tell us what you think about it.

Someone may want to know whether it is possible to construct a pool in a memory allocated form another pool. It is possible, but one should take care to destroy the inner pool prior to destruction of the outer pool or a call to recycle(). Do you know a good reason to enable such a nesting?

Memory provider interface

From an API designer perspective, the memory provider is the most questionable part of the scalable pools API. And since it is yet a community preview feature, you are welcome to influence its design. Curious readers might want to ask questions like the following:

  • what are the requirements for the template argument?
  • why is std::allocator used as a memory provider?
  • why the type used with std::allocator in examples above is “char”?

The template argument of tbb::memory_pool accepts a memory provider class which satisfies minimal requirements of STL compatible allocator according to the last C++11 standard: allocate and deallocate methods, and a value_type definition.

Using std::allocator and compatible classes is perhaps the most straight-forward way to enable memory_pool anywhere. However from efficiency standpoint, it makes probably not much sense because such allocators are intended for rather small objects by design while memory provider should operate with megabytes. For users who don’t care what the memory provider is, we could better provide a default one instead which would map to system-default way for memory mapping.

And finally, TBB memory pools don’t really need the type of allocation (i.e. char in the declaration of tbb::memory_pool<std::allocator<char>>), but rather need to know the granularity of requests to the memory provider. And this is not only specification for type of arguments for allocate and deallocate, this information is used in our implementation to determine size of memory requests to memory provider. For example, consider big pages which can be mapped only by chunks of megabytes:

// A custom memory provider for memory_pool
class big_pages {
    typedef char[2*1024*1024] value_type;
    void *allocate(size_t pages) {
        return mmap(0x0UL, pages*2*1024*1024, PROT_READ | PROT_WRITE,
                    MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, 0, 0);
    // the pointer type requirement is also actually relaxed
    void deallocate(void *ptr, size_t pages) {
        munmap(ptr, pages*2*1024*1024);
// usage:
tbb::memory_pool<big_pages> my_pool;

Some food for thoughts

The way granularity is specified in the line 4 in the above example is not straight-forward and can be viewed as confusing. This is the price of STL-compliant interface of the memory provider and we are not sure if it has more pros than cons:

  • STL compatibility is supposed to reuse widely implemented memory allocators.
    • On the other hand, these allocators are usually purposed for small sizes of allocations but a pool will need memory chunks of at least hundreds of kilobytes.
  • In theory, it allows easy nesting of memory pools using our memory_pool_allocator class.
    • But we studied that nesting of the pool in some other implementations does not mean reusing the memory allocated by parent pool but rather a hierarchy of pool objects.
    • And such a nesting is not yet supported anyway
  • It is easier to remember the requirements based on well-known standard interface
  • Granularity is a property of the memory provider and must be passed along with it

As an alternative interface, we consider to make the granularity explicitly specified but in a separate trait class which should be specialized only for the memory providers with granularity of allocations > 1. It is even possible to keep STL-compatibility using metaprogramming magic, e.g. define the granularity to sizeof(value_type) if value_type defined.

Another question is how to introduce alignment in the interface of memory pools. Basically, it can be either aligned_malloc() and aligned_realloc(), or an optional argument for malloc() and realloc() methods.

Also, are the suggested class names good, or do we need to find better names (for instance, "fixed_region" and "dynamic_region" to align with terms of Wikipedia)?

Feedback is very welcome

We are very eager to hear from you what do you think about above and how can it be used in your projects.

Product and Performance Information


Intel's compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors. These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.

Notice revision #20110804