Variable-size memory requests make TBBmalloc 3.0.x Win32 run out of memory and crash (demo code included)

Variable-size memory requests make TBBmalloc 3.0.x Win32 run out of memory and crash (demo code included)

Dear Intel TBB Developers:

My colleagues and Ihave been using TBBmalloc extensively since
mid-2010 as a drop-in replacement for the heap managers included in Windows. So
far, we have been very happy with the performance of TBBmalloc in our natively
multithreaded, heavy-duty data analysis application.

Recently, we encountered a few datasets that caused unexpected
out-of-memory conditions in the 32-bit version of our application; without
TBBmalloc, memory use stayed below 300 MB. I believe we understand the reason
for TBBmallocs failure. Therefore, we would like to bring to your attention the
flaw in TBBmalloc that we have exposed, as well as propose an idea for its possible
resolution.

Please find below a piece of stand-alone, single-threaded C++ code
that I created to reproduce and illustrate the problem we uncovered.

// Standalone demo for a flaw in TBBMALLOC
// Confirmed in TBB 3.0 updates 3, 6 and 8 with MS Visual C++ 2005 Win32/x64
// Provided by Alexandre Telnov, Ph.D.

#include 
#include 
#include 
using std::vector;
using std::cout;
using std::endl;

// Recommended values to try on Win32:
//
// NVEC=128, MAXVECSIZE=2048: will use about 133 MB with Windows memory allocator. 
// With TBB, it will hit 2 GB and crash after ~138 million allocations (~1 minute)
//
// NVEC=192, MAXVECSIZE=2048: will use about 200 MB with Windows memory allocator. 
// With TBB, it will hit 2 GB and crash after ~15 million allocations (~6 sec)
//
// NVEC=256, MAXVECSIZE=2048: will use about 270 MB with Windows memory allocator. 
// With TBB, it will hit 2 GB and crash after ~3 million allocations (~1 sec)

#define NVEC 192 // number of vectors in a cyclical buffer
#define MAXVECSIZE 2048 // maximum size of vector to be created (in kilobytes)

int main(int argc, char* argv[])
{
  size_t avgSizeMB = NVEC * ( MAXVECSIZE / 2 ) / 1024;
  cout << "Repeadly reallocate " << NVEC << " vectors of random sizes varying from 0 to " 
    << MAXVECSIZE << " kB." << endl;
  cout << "Memory usage is naively expected to hover slightly above " << avgSizeMB 
    << " MB.n" << endl;

  vector *ptrs[NVEC];  
  for (int i=0; i;
      size_t size = MAXVECSIZE*(1024ull/sizeof(int))*rand()/(RAND_MAX+1);
      
      //  *** choose one of: reserve, resize, or push_back ***
      //
      //     reserve(): with TBB, blows the memory fast
      ptrs[j]->reserve(size); 
      // -- or
      //     resize(): with TBB, blows the memory after the same number of allocations
      //     as reserve() - but about x100 slower than reserve() because resize() 
      //     initializes the allocated memory
      // ptrs[j]->resize(size);
      // -- or
      //     push_back(): memory use with TBB only slightly greater than with the 
      //     Windows allocator  because the memory requests involved in vector 
      //     autogrowth are limited to certain sizes.
      // for (unsigned int k=0; kpush_back(k);     
    }
  }
  catch(...)
  {
    clock_t end(clock());
    double mcsec = (end-start)/(CLOCKS_PER_SEC*1e-6);
    cout << "Memory exhausted after " << i << " allocations, " << mcsec*1e-6 << " sec," << endl; 
    cout << mcsec/i << " microseconds per allocation." << endl;
    cout << "This is the end." << endl;
  }
} // end of program

Please note that the demo code above illustrates bona fide needs
of a broad variety of large-scale data analysis applications. As terabytes of
data stream through our application, we organize and temporarily store some of
the data in STL vectors that can range in size from ~100 kB to ~10 MB and
beyond, with no guaranteed maximum, and need to be repeatedly created and destroyed.
The size of each vector becomes known just before its creation thus, for
performance, to avoid the memory reallocation and memcpy overhead involved in
STL vectors automatic growth, one would want to call vector::reserve(size_t) or vector::resize(size_t) before
the vector is filled.

Possible reason TBBmalloc runs
of of memory:
TBBmallocs strategy for allocating thread- and block size-specific
memory pools makes it extremely efficient at handling repeated requests for
blocks of memory of the sizes that have been requested before and also allows
it to avoid the curse of heap memory fragmentation. When vector::reserve(size_t) is not
used, Microsoft's implementation of std::vector
requests memory in blocks of sizeof(T) * (1, 2, 3, 4, 6, 9, 13, 19, 28, 42, 63, 94, 141, 211, 316,
474, 711,...) bytes as it grows [sizeof(T) * powers
of 2 in the case of GNU C++]. However, when vector::reserve(size_t) is employed,
user-specified amounts of memory are explicitly requested from the heap.
Repeated requests for random amounts of memory make TBBmalloc v3.0.x allocate
and hold onto too many memory pools, run out of memory, and crash.

Possible solution: when the
Windows heap manager refuses TBBmallocs request for yet another memory pool (or
when a high rate of hard page faults is detected in the system this would be
especially important for 64-bit systems with a limited amount of RAM), TBBmalloc
should identify the memory pools that have been unused the longest and release them
back to Windows.

Sincerely,

Alexandre Telnov, Ph.D.

catch(...)
{
clock_t end(clock());
double mcsec = (end-start)/(CLOCKS_PER_SEC*1e-6);
cout << "Memory exhausted after " << i << " allocations, " << mcsec*1e-6 << " sec," << endl;
cout << mcsec/i << " microseconds per allocation." << endl;
cout << "This is the end." << endl;
}

22 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Maybe you can experiment with different values of largeBlockCacheStep or cacheCleanupFreq?

I am suspicious of the |= operator used in ExtMemoryPool::doLOCacheCleanup(), though: is the short-circuiting intentional or not? If it is unintentional, replacing "res |= [...];" with "res = [...] || res;" might provide relief. You could try that as well.

These are only superficial observations, though: closer study is probably required to resolve the problem.

Thank you for the report!

Situation that you describe is definitely possible with prior versions of TBB malloc. I hope that in 4.0 we resolved this, i.e. internal caches cleanup is done on out-of-memory situation exactly as you suggest.

I don't have VS2005 locally right now, but for VS2010 with TBB 4.0 I don't see memory exhaustion. May I ask you try TBB 4.0 and report the result? Allocator internals was changed significantly, as we interested in both performance and memory consumption comparison.

Nice to hear that the problem may have already been solved. I tend to just assume that other people also use the latest release to verify things before reporting, especially if they have a Ph.D. :-)

Still, can you assure that (all) compilers will compute "res |= x;" as "res = res | x;", i.e., a bitfield operation, without resorting to short-circuiting as a way to optimise the boolean expression, because a bool is supposed to be equivalent to a single bit and normally the compiler takes care of appropriately converting incoming values, so it would make sense to treat | as || with boolean operands? It seems a vulnerable assumption (although I only have recollection of evidence supporting it, and no inclination to investigate further at this time), which if not assured would seem to lead to only partial cache cleanup.

Quoting Raf SchietekatStill, can you assure that (all) compilers will compute "res |= x;" as "res = res | x;", i.e., a bitfield operation, without resorting to short-circuiting as a way to optimise the boolean expression, because a bool is supposed to be equivalent to a single bit and normally the compiler takes care of appropriately converting incoming values, so it would make sense to treat | as || with boolean operands? It seems a vulnerable assumption (although I only have recollection of evidence supporting it, and no inclination to investigate further at this time), which if not assured would seem to lead to only partial cache cleanup.

Yes, good point. Thanks!

Alexandre, regarding your allocation problem:

Your problem is not necessarily a fault with the TBB scalable memory allocator. It was designed to provide fast multi-threaded allocation/deallocation of frequently used sizes of memory blocks. Your application (and test program) does not behave in this manner. The question is: should TBB fix this or should the programmer fix this?

My opinion is the programmer needs to fix this.

What I would suggest you do is to reduce the number of different sizes of your allocations.
You could use any such scheme, one may be better than the other for you.

size_t size = yourSizeComputation;
size = yourChunkingFuncton(size);

An example might be to allocate in 1KBchunksizes
or some log scale.

The idea, is for what ever number of different sizes you need
Reduce the number of allocations.

BTW

I assume that your application will work fine when all allocated vectors are at max size.

If this is not the case, then the suggestion is to construct your own pool of pools of chunk sizes, preallocated.
Then your code will never have an allocation failure. If (when) you have a stream of adjacent very large allocations, your large pool may be empty, and therefore your thread is coded to stall until node becomes available. While you could have it task steal, it could steal a task that requires another very large node from an exhausted pool, this could repeat indefinately causing stack overflow. Stalling under this circumstance is the lesser of two evels.

Jim Dempsey

www.quickthreadprogramming.com

"My opinion is the programmer needs to fix this."
The goal should still be to have the TBB scalable memory allocator be a panacea if at all possible.

Maybe the behaviour with large sizes (speed, essential overhead, fragmentation) could be part of an updated comparison between different algorithms?

>>The goal should still be to have the TBB scalable memory allocator be a panacea if at all possible

TBB scalable memory allocator is fast because it is a pool-of-pools within/across slabs allocator. These types of allocators are subject to larger memory consumption. While TBB could add reconsolidation of freed memory (on allocation failure), these techniques usualy are ineffective or not effective enough.

If one node within a pool is allocated you cannot return a pool (read no consolidation of adjacent "empty"pools)
If one pool within a slab has one node allocated you cannot return a slab (read no consolidation of adjacent "empty" slabs).

Relying on completely freeing all nodes withing pool and slab is a "working by chance" solution as opposed to "working by design" solution.

Alex can resolve this by:

Using standard C heap (optionaly Low Fragmentation Heap) which consolidates adjacent nodes without regard to node size.

Using TBB scalable allocator AND coding in allocation strategies that reduces the number of different sized nodes while not not running out of memory due to excessive unused memory at tail-end of node.

Alex could perform a statistical analysis of the node-size and frequency then determine how to partition the sizes. This can be done by guess, table, or formula. Alex will still have a potential for a problem should the data stream receive a long series of his largest allocations. In this case, the number of allocations for the numbers of working threads may exceed the memory allocation capacity of his system. He should consider adding defensive code to handle this situation. This is why I suggested he add his own pooling system, at least for the large(st) of his allocation sizes. This pool, if allocated once, will never cause allocation failure from the heap (program crash), but may cause allocation failure from his pool. When this occurs, threads starve for memory (stall or steal). I suggest he be careful of stealing, because stealing may result in next task attempting to allocate from the same empty pool (ad nausium untill crash). Running out of memory resources is a problem best avoided.

Jim Dempsey

www.quickthreadprogramming.com

Alexandr:

Thank you very much for suggesting I take a look at TBB 4.0. I have retrieved a stand-alone copy of TBB 4.0 update 1 and confirm that it does indeed handle the reported out-of-memory condition gracefully - both in the demo code from my original post and in our data-analysis app.

For the benefit of the members of this community, I attach the memory-use vs time stripchart of the demo code (with the vector::resize() block enabled instead of the vector::reserve() block to stretch the time axis). The vertical-axis range on this plot is [0, 2.2] GB. On the hosizontal axis, one pixel equals 10 seconds.

We shall now proceed with full regression testing of our code to verify that TBB 4.0u1 performs well in all circumstances.

Best regards,

Alex Telnov

#7 "TBB scalable memory allocator is fast because it is a pool-of-pools within/across slabs allocator."
And yet it provides instructions to retarget new/delete, or even malloc/free altogether. I haven't studied the new code for big allocations (yet), but probably the exercise was mostly to reduce overhead (which could be prohibitive in a certain range), as performance of big allocations is likely to be amortised better than for small allocations. I'm curious exactly how well it behaves relative to the previous situation (delegate to malloc with "some" alignment overhead to be able to find some administrative data), relative to plain malloc, relative to a scaled-up version of the code for small allocations (which I don't presume it is), etc. Apparently it suits Alex' purposes, which is very nice to hear!

Alex,

What happens with the new code when using reserve(size)?

This was the original problem statement.

Jim Dempsey

www.quickthreadprogramming.com

Jim:

In fact, we have already considered and/or attempted each of the workarounds you suggest:

- Allocating and reusing vectors of a certain MAX capacity is not an option because the maximum size of vectors in our input data is, in general, unconstrained. We'd run out of memory if we were to try this approach. This would be akin to storing the text of a book as an array/vector of char paragraph[1000000] just because somewhere someone might have a 999,999 character-long paragraph.

- As a zeroth order workaround, we commented out all calls to vector::reserve(size) and vector::resize(size) that are repeated within the "outer" loops of our application - except the cases where size is an app-wide constant.

- In our case, objects of class/struct T are rather heavy-weight. In particular, they include various STL containers as members. Thus, the copy constructors (which are involved in vector::push_back()) are slow. To overcome the hit on performance that resulted from eliminating reserve()/resize() calls, I extensively refactored our data-storage classes with the use of reference-counting smart pointers and other tricks that minimize the need for copying complex objects.

- Additionally, I introduced custom memory pools into our code. In this approach, vector is replaced with vector, and T objects are created in the custom memory pool via placement new(). This actually led to a slight decrease in performance because TBB's memory pools are already very efficient. Yes, custom memory pools would have overcome the TBB 3.0 out-of-memory condition. However, the added complexity of the code would have made it difficult to maintain. We therefore chose to roll back these changes.

- Just before I made my original post, we were indeed weighing the possibility of reintroducing vector::reserve(size) in a way that would limit the number of possible values of size to some log scale (either a la MS STL or powers of 2). We may indeed choose to do it.

I would still argue that the forced cleanup of unused memory pools on an out-of-memory condition that was introduced in TBB 4.0 is a good thing. There is nothing inherently wrong about coding with vector::reserve(). In fact, it has always been considered a good coding practice because it's good both for performance and (when used with a conventional heap manager) for the application's memory footprint. TBBmalloc changes this wisdom. Perhaps this should be reflected in TBB documentation.

BTW, I do not believe the STL standard specifies that vec.capacity() == size should be true after vector vec; vec.reserve(size); (or vec.resize(size)). In a given STL implementation, vector's capacity can be size or greater. As pooling memory allocators such as TBBmalloc become more widespread, perhaps STL developers would implement log-scaling of vector capacities in a way that is transparent to the user.

Alexandre Telnov

Jim:

With TBBmalloc 4.0u1, the demo code in my original posting (with vector::reserve()) runs fine as well: TBB 4.0u1 is able to release unused pools/slabs as app's memory size comes dangerously close to 2 GB. Please see the screenshot below. Here, 1 pixel along the time axis is 0.5 seconds.

The only difference between reserve() and resize() in the demo app is that resize() takes about x100 more time as it needs to initialize the allocated vector elements. The plot for the vector::resize() case attached to my earlier posting used a 10 sec/pixel scale.

Alexandre Telnov

"BTW, I do not believe the STL standard specifies that vec.capacity() == size should be true after vector vec; vec.reserve(size); (or vec.resize(size)). In a given STL implementation, vector's capacity can be size or greater.
As pooling memory allocators such as TBBmalloc become more widespread,
perhaps STL developers would implement log-scaling of vector capacities
in a way that is transparent to the user."
At least in version 2003, and N3242 of C++0x (the latest I've looked at), it would indeed require only that vec.capacity() >= size. Vector insertion operators have complexity linear in the sum of number of elements inserted and number of elements past the insertion point, which probably means amortised linear-time complexity, and I thought that this is only possible for implicit reallocation to exponentially growing capacities at logarithmically growing occasions (right?), so an implementation would have to be very deliberate about deviating from that for explicit calls to reserve(). Still, it appears that some do deviate, requiring the user to be careful that using reserve() repeatedly on the same vector does not decrease the performance of a program rather than potentially increasing it, by explicitly providing exponentially growing values. A strange and unfortunate situation when attempting to write programs whose algorithmic complexity is also portable.

>>...
>>//WithTBB,itwillhit2GBandcrashafter~138millionallocations(~1minute)

>>...
>>//WithTBB,itwillhit2GBandcrashafter~15millionallocations(~6sec)
>>...
>>//WithTBB,itwillhit2GBandcrashafter~3millionallocations(~1sec)
>>...

Hi Alex,

Ona test computer with 32-bitWindows XPI was never able to allocate more than ~2.18 GB of memory from the heapeven ifVirtual Memory hasinitial size 2,048 MB and maximum size 3,072 MB.

When I do astress testing of Strassen Heap Based algorithm for matrix multiplication ( Test-Case: 2,048x2,048 \ Threshold is64x64 \ Number of Partitions is19,608)an application always crashes when it reaches ~2.18 GB limit. This is a hardware limitation and Dell clearly confirmed it.

I also use MS Visual Studio 2005 and attempts to change Linker Settings( Heap Reserved \ Heap Commit \ Stack Reserved \ Stack Commit \ Enable Large Addresses ) failed.

In my case the only solution is a newcomputer with more memory.A64-bit version of Windows is also under consideration...

Best regards,
Sergey

Take a look at enclosed jpg-files:

Task Manager
just before the application crashes:

MS Visual Studio 2005 Linker Settings:

Note: 2nd post ( complete duplicate of previous) removed. Sorry about this.
Best regards,
Sergey

Hi Alex,

Here are preliminary results of my investigation ( I've spent already more than 6 hours... ). More detailed report will be submitted later.

Best regards,
Sergey

Alexandre,

Thanks for your comment an posting the chart.

In your test program you had NVEC=128, 192, and 256. Meaning worst case slab requirementsof 128, 192 and 256 slabs. As the pre- u1 version of the allocator handled your test program's allocations, the number of slabs grew (due to larger numbers of different sized allocations through the history of the run of the application). With the u1 modification, slab recovery can occur, but it could not be reduced to fewer than the slabs required to pool the in-use vectors. Meaning, should your real application have NVEC=4096, you could potentially have a problem. Some defensive code may still be required on your part to avoid a nasty surprise.

The u1 changes appear to be a great improvement. Thank you for your suggestion.

Jim Dempsey

www.quickthreadprogramming.com

Sergey:

You posting touches upon an entirely different issue: the address space available to 32-bit applications depending on the Windows version and the linker flags used.

- If linked with /LARGEADDRESSAWARE:NO, a 32-bit application and the DLLs it loads can address only 2 GB, even if it is run on 64-bit Windows.

- If linked with /LARGEADDRESSAWARE, a lot depends on the Windows version and - especially in the case of 32-bit XP/Vista/Win7 - on the hardware configuration.

= On 64-bit Windows, 4 GB can be addressed.

= On 32-bit Windows, /LARGEADDRESSAWARE has no effect unless Windows is started with the /3GB option.

In 32-bit XP/Vista/Win7, depending on the amount of RAM installed, presence of shared video memory, etc., /3GB may give your /LARGEADDRESSAWARE application somewhat more than 2 GB to work with. The rarely reached theoretical limit is 3 GB - this requires a machine with 4 GB of RAM and few other applications running.

Our case is different: we only use Windows Server 200x Enterprise Edition. The /3GB version does not really work with Windows Server 32-bit because the system quickly becomes acutely starved for resources and hangs or crashes. Even for 32-bit applications, 64-bit Windows is a much better Windows!

Alexandre

Alex,

Did you have a chance to testthe code from your1st post in Release configuration?

In Debug configuration it is always slower andthere is an extra impact from a debug version ofthe operator 'new'.

For example, in case of using Microsoft's debug 'new' a call to '_malloc_dbg' will be made and 36extra bytes will be added. These 36 bytes are needed for Memory Leaks Detection.

Look at MSDN's articleMemory Management and the Debug Heap if interested.

In case ofpure-STLapplication I didn't have any problems with your recommended NVEC & MAXVECSIZE values. It worked, worked and worked...

If NVECwas greater than1024the application exited as soon as ~2.2 -~2.4GB of memory allocated and it was expected.

A test when 'push_back' was used workedalmost ~15x slower.

Also, if there is a need to detect a moment whenno more memory isavailable for the 'new' operatora 'set_new_handler' function could be used.

I've created two more test-cases based on your original. A verification for different types of data ( short & float )also could be done.

Sub-Test 1 is your original test with small modifications; Sub-Tests 2 & 3are a little bit different from 1. Of course, only one Sub-Testhas to be used.

It's not clear why'resize'was executed so slowly during your tests. In my tests it was ~ 1.4x slower andI wouldn't complain at CRT's 'memset' function.

///////////////////////////////////////////////////////////////////////////////
// Repeadly reallocates STL-based vectors of random sizes

#include
#include
#include
#include

using std::vector;
using std::cout;
using std::endl;

// Recommended values to try on Win32:
//
// NVEC=128, MAXVECSIZE=2048: will use about 133 MB with Windows memory allocator.
// With TBB, it will hit 2 GB and crash after ~138 million allocations (~1 minute)
//
// NVEC=192, MAXVECSIZE=2048: will use about 200 MB with Windows memory allocator.
// With TBB, it will hit 2 GB and crash after ~15 million allocations (~6 sec)
//
// NVEC=256, MAXVECSIZE=2048: will use about 270 MB with Windows memory allocator.
// With TBB, it will hit 2 GB and crash after ~3 million allocations (~1 sec)
//
// NVEC=1024, MAXVECSIZE=2048: SergeyK - greater test-case values
//
// SergeyK's statistics:
//for 'vector.reserve' ~28 - ~38 secs to allocate ~2.42 GB of memory
//for 'vector.resize' ~40 - ~45 secs to allocate ~2.42 GB of memory
//for 'vector.push_back' ~430 secsto allocate ~2.42 GB of memory

#define NVEC1024 // Number of vectors in a cyclical buffer
#define MAXVECSIZE2048 // Maximum size of vector to be created ( in kilobytes )

//#define _RTTYPEshort
#define _RTTYPEint
//#define _RTTYPEfloat

void RunTest( void )
{
srand( ( unsigned int )clock() );

size_t avgSizeMB = ( NVEC * ( MAXVECSIZE / 2 ) / 1024 );

cout << "Repeadly reallocate " << NVEC << " vectors of random sizes varying from 0 to "
<< MAXVECSIZE << " kB." << endl;
cout << "Memory usage is naively expected to hover slightly above " << avgSizeMB
<< " MB.\n" << endl;

int i;

// Sub-Test 1 - 'new' operator is inside of Infinite loop
{
/*
vector< _RTTYPE > *ptrs[ NVEC ] = { 0x0 };

clock_t start( clock() );

try
{
for( i = 0; true; i++ )// Infinite loop
{
int j = i % NVEC;// 0 <= n < NVEC

delete ptrs[j];
ptrs[j] = new vector< _RTTYPE >;

size_t size = MAXVECSIZE * ( 1024 / sizeof( _RTTYPE ) ) * rand() / ( RAND_MAX + 1 );

// *** choose one of: reserve, resize, or push_back ***
ptrs[j]->reserve( size );
//ptrs[j]->resize( size );
//for( unsigned int k = 0; k < size; k++ ) ptrs[j]->push_back( k );
}
}
catch( ... )
{
clock_t end( clock() );
double mcsec = ( end - start ) / ( CLOCKS_PER_SEC * 1e-6 );
cout << "Memory exhausted after " << i << " allocations, " << mcsec * 1e-6 << " sec," << endl;
cout << mcsec/i << " microseconds per allocation." << endl;
cout << "This is the end." << endl;
}
*/
}

// Sub-Test 2 - 'new' operator is outside of Infinite loop
{
/*
vector< _RTTYPE > *ptrs[ NVEC ] = { 0x0 };

for( i = 0; i < NVEC; i++ )
{
ptrs[i] = new vector< _RTTYPE >;
ptrs[i]->reserve( MAXVECSIZE );
}

clock_t start( clock() );

try
{
for( i = 0; true; i++ )// Infinite loop
{
int j = i % NVEC;// 0 <= n < NVEC

size_t size = MAXVECSIZE * ( 1024 / sizeof( _RTTYPE ) ) * rand() / ( RAND_MAX + 1 );

// *** choose one of: reserve, resize, or push_back ***
//ptrs[j]->reserve( size );
ptrs[j]->resize( size );
//for( unsigned int k = 0; k < size; k++ ) ptrs[j]->push_back( k );
}
}
catch( ... )
{
clock_t end( clock() );
double mcsec = ( end - start ) / ( CLOCKS_PER_SEC * 1e-6 );
cout << "Memory exhausted after " << i << " allocations, " << mcsec * 1e-6 << " sec," << endl;
cout << mcsec/i << " microseconds per allocation." << endl;
cout << "This is the end." << endl;
}
*/
}

// Sub-Test 3 - array of vector objects created on the stack outside of Infinite loop
{
///*
vector< _RTTYPE > trs[ NVEC ];

for( i = 0; i < NVEC; i++ )
trs[i].reserve( MAXVECSIZE );

clock_t start( clock() );

try
{
for( i = 0; true; i++ )// Infinite loop
{
int j = i % NVEC;// 0 <= n < NVEC

size_t size = MAXVECSIZE * ( 1024 / sizeof( _RTTYPE ) ) * rand() / ( RAND_MAX + 1 );

// *** choose one of: reserve, resize, or push_back ***
//trs[j].reserve( size );
trs[j].resize( size );
//for( unsigned int k = 0; k < size; k++ ) trs[j].push_back( k );
}
}
catch( ... )
{
clock_t end( clock() );
double mcsec = ( end - start ) / ( CLOCKS_PER_SEC * 1e-6 );
cout << "Memory exhausted after " << i << " allocations, " << mcsec * 1e-6 << " sec," << endl;
cout << mcsec/i << " microseconds per allocation." << endl;
cout << "This is the end." << endl;
}
//*/
}
}

>>...= On 32-bit Windows, /LARGEADDRESSAWARE has no effect unless Windows is started with the /3GB option...

Here is a link to Microsoft's Technet article:

http://technet.microsoft.com/en-us/library/bb124810(EXCHG.65).aspx

...

The /3GB switch is supported only on the following operating systems:

Windows 2000 Advanced Server
Windows 2000 Datacenter Server
Windows Server 2003 Standard Edition
Windows Server 2003 Enterprise Edition
Windows Server 2003 Datacenter Edition
...

UnfortunatelyWindows XP is not on the list.

Hello Alex,We have posted tbb40_20111109ossdevelopment release with the fix on OSS site. Could you check that the fix was addressed to include the fix to the stable release?thanks,Vladimir

Leave a Comment

Please sign in to add a comment. Not a member? Join today