malloc & free memory problem when openmp activated on win32 plateform

malloc & free memory problem when openmp activated on win32 plateform

I have a program using some large memory chunks (kind of Gbyte).
The weired thing is when I using more than one openmp thread, the large chunk of memory are not freed clearly (not return to the system) . If only using one thread, there is no problem for free.

Another problem is that windows version compiler indicates the /Qopt-malloc-options, I have tried it in vain : I wonder if it's a fake option and can noly be used on Linux or MacOS.

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Quoting - lanzors
I have a program using some large memory chunks (kind of Gbyte).
The weired thing is when I using more than one openmp thread, the large chunk of memory are not freed clearly (not return to the system) . If only using one thread, there is no problem for free.

Another problem is that windows version compiler indicates the /Qopt-malloc-options, I have tried it in vain : I wonder if it's a fake option and can noly be used on Linux or MacOS.

The major soruce of these problems are programmer errors.

Check not only your deallocations but your allocations. A particular nasty is allocating into a pointer by multiple threads when the storage location for the pointer isa shared variable. (misplaced {}'s).

Try performing your allocations within a struct/class object where the dtor releases the allocated memory (like the string class). And be sure to place the object within the correct scope.

Jim Dempsey

www.quickthreadprogramming.com

Quoting - jimdempseyatthecove

The major soruce of these problems are programmer errors.

Check not only your deallocations but your allocations. A particular nasty is allocating into a pointer by multiple threads when the storage location for the pointer isa shared variable. (misplaced {}'s).

Try performing your allocations within a struct/class object where the dtor releases the allocated memory (like the string class). And be sure to place the object within the correct scope.

Jim Dempsey

At first thanks very much for your response!

What you mean that allocating by multiple threads ? I am quite sure that I does not use any dynamic allocation in an openmp region. In fact in my program there is very few openmp regions. Shall I using omp_set_num_threads(1) evry time when I allocate/free memory ?

Another trace, I don't have this malloc/free problem on the linux plateform, both mono-theads/multi-threads works correctely. Even more, I know exactely which memory chunk are not freed to the system, because such a big chunk are very rarely used in the program.

When you want each thread to have their own array

double* array = 0; // *** bad, pointer in wrong scope
// ok to do this when shared(array) on pragma
#pragma omp parallel
{
array = new double[count]; // *** bad all threads sharing same pointer
// *** 2nd and later threads overwrite pointer
...
delete [] array; // *** 2nd and later threads returning same memory
}

------------------------------------

#pragma omp parallel
{
double* array = 0;
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}

--------------------

double* array = 0; // OK because of private(array) on pragma
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}

--------------------

double* array = 0;
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
}
delete [] array; // ***badmain thread returning one copy

There is nothing wrong with new/delete inside parallel regions, in fact it may be required when you want each thread to have seperate data (e.g. for temporary arrays).

Jim Dempsey

www.quickthreadprogramming.com

Quoting - jimdempseyatthecove

When you want each thread to have their own array

double* array = 0; // *** bad, pointer in wrong scope
// ok to do this when shared(array) on pragma
#pragma omp parallel
{
array = new double[count]; // *** bad all threads sharing same pointer
// *** 2nd and later threads overwrite pointer
...
delete [] array; // *** 2nd and later threads returning same memory
}

------------------------------------

#pragma omp parallel
{
double* array = 0;
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}

--------------------

double* array = 0; // OK because of private(array) on pragma
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
delete [] array; // *** good each thread returning seperate copy
}

--------------------

double* array = 0;
#pragma omp parallel private(array)
{
array = new double[count]; // *** good when you want each thread to have seperate copy
...
}
delete [] array; // ***badmain thread returning one copy

There is nothing wrong with new/delete inside parallel regions, in fact it may be required when you want each thread to have seperate data (e.g. for temporary arrays).

Jim Dempsey

Thanks for the precision. I don't have any private dynamic memory used in parallel zone, only use the share memory there and very few small stacks.

Problem I obeseved comes from two big arrays allocated in non-parallel zone (which have never been used in any parallel region). The free() does not return these memory chunks to system immediately (or memory heaps become very fregement) then the program will stop later by lack of memory if I want to allocated another big array.

I have experimented the very similar problem before on the AIX plateform and has been suggested to use discliam() function to "declare" and "return" these memory chunks to system. On linux system, one can use mallopt() function to reset some memory allocation parameters to reproduce the same symptomes.

Quoting - lanzors

Thanks for the precision. I don't have any private dynamic memory used in parallel zone, only use the share memory there and very few small stacks.

Problem I obeseved comes from two big arrays allocated in non-parallel zone (which have never been used in any parallel region). The free() does not return these memory chunks to system immediately (or memory heaps become very fregement) then the program will stop later by lack of memory if I want to allocated another big array.

I have experimented the very similar problem before on the AIX plateform and has been suggested to use discliam() function to "declare" and "return" these memory chunks to system. On linux system, one can use mallopt() function to reset some memory allocation parameters to reproduce the same symptomes.

When you have a single threaded application that is tight on memory (as you suggest your application is)
Then when you use OpenMP or any threading tool, each thread is going to be instantiated with its own stack. The low end of these stack size can be a few MB, but you can set the stack size to small or large. Default is stack size of main thread. You might want to experiment with adjusting the stack size for the non-main thread.

Jim Dempsey

www.quickthreadprogramming.com

Quoting - jimdempseyatthecove

When you have a single threaded application that is tight on memory (as you suggest your application is)
Then when you use OpenMP or any threading tool, each thread is going to be instantiated with its own stack. The low end of these stack size can be a few MB, but you can set the stack size to small or large. Default is stack size of main thread. You might want to experiment with adjusting the stack size for the non-main thread.

Jim Dempsey

This may be what I shall try. Can you please tell me how to do it?
Thanks a lot!

Thread stack size hasn't been standardized in OpenMP. Intel OpenMP controls it by the KMP_STACKSIZE environment variable, or by the corresponding library function calls. Defaults vary with target OS.

Quoting - tim18
Thread stack size hasn't been standardized in OpenMP. Intel OpenMP controls it by the KMP_STACKSIZE environment variable, or by the corresponding library function calls. Defaults vary with target OS.

Thanks! I've just googled that, but it seems that default size is 2M.
I'm gonna try it...

I've just tried it, (with 2M and 2 openMP threads), the problem persist.

Quoting - lanzors

Thanks! I've just googled that, but it seems that default size is 2M.
I'm gonna try it...

I've just tried it, (with 2M and 2 openMP threads), the problem persist.

Yes, if the default is 2M (normal for 32-bit), setting the same value should change nothing.

Quoting - tim18

Yes, if the default is 2M (normal for 32-bit), setting the same value should change nothing.

Another potential problem is at what point in your application the OpenMP thread pool is established.

IIF thread pool is established on the 1st entry into the 1st parallel region AND IFF that region is deep in your code, the new stack spaces might be allocated at some midpoint in your allocations, thus potentially causing some undesired fragmentation with your heap. An easy way to fix this is to insert a parallel region just after entry to main which does something that does not get optimized out

#pragma omp parallel
{
if(omp_get_thread_num() < 0) exit();
}

You may need to trace your allocations/deallocations to find the problem and/or insert some well crafted _ASSERT

YourAllocatorAssumesSerial(...)
{
_ASSERT(omp_in_parallel() == 0);
// now allocate...
...
}

And you may need code to check for leaks and/or allocations when not required

// static pointer
double* array = NULL;

...
YourAllocationRoutine()
{
_ASSERT(array==NULL);
array = new double[yourSize];
...
}

If each thread mistakenly called the allocation routine you would have a leak.

See what you can do to reduce your footprint. (maybe optimize for reduced size)

Jim Dempsey

www.quickthreadprogramming.com

Quoting - tim18
Yes, if the default is 2M (normal for 32-bit), setting the same value should change nothing.

Tell me if I made a mistake : 2 threads and each use 2M of stack, doesn't mean 4M stack at total? Which is almost nothing comparing to my memory lacking.
thanks.

Quoting - jimdempseyatthecove

Another potential problem is at what point in your application the OpenMP thread pool is established.

IIF thread pool is established on the 1st entry into the 1st parallel region AND IFF that region is deep in your code, the new stack spaces might be allocated at some midpoint in your allocations, thus potentially causing some undesired fragmentation with your heap. An easy way to fix this is to insert a parallel region just after entry to main which does something that does not get optimized out

#pragma omp parallel
{
if(omp_get_thread_num() < 0) exit();
}

You may need to trace your allocations/deallocations to find the problem and/or insert some well crafted _ASSERT

YourAllocatorAssumesSerial(...)
{
_ASSERT(omp_in_parallel() == 0);
// now allocate...
...
}

And you may need code to check for leaks and/or allocations when not required

// static pointer
double* array = NULL;

...
YourAllocationRoutine()
{
_ASSERT(array==NULL);
array = new double[yourSize];
...
}

If each thread mistakenly called the allocation routine you would have a leak.

See what you can do to reduce your footprint. (maybe optimize for reduced size)

Jim Dempsey

I can make the 2nd test, even I'm quite sure that all of my memory allocation shall be out of parallel region.
3rd test can be interesting, I can always perserve a few static pointers for the big array, it may change something.
First test is just to ensure the abnormal behavoir, isn't it?

Thanks again for the precious advices.

Could you please help us with test case to look into your problem if this is not resolved?

Om

Lanzors,

By using _ASSERT(expression) the code only expands in Debug build. So your Release build has no overhead.
However, as much as you try to keep your allocations under control, when you hand this code off to someone else to support, they may not be as careful as you are. The _ASSERT is in there to catch for these types of potential errors (now or in the future). You should get in the habit of using _ASSERT throught your code to test for all kinds of errors, principly argument checking, but in some places results checking, or convergence problems testing.

Also, you might try setting "Low Fragmentation Heap"

See MS C++ help on

heap functions | HeapSetInformation

Then once you read that, follow link to "Low Fragmentation Heap"

From MS C++ Help

The following example shows you how to enable the low-fragmentation heap.

#include 
#include 

void main()
{
    ULONG  HeapFragValue = 2;

    if(HeapSetInformation(GetProcessHeap(),
                       HeapCompatibilityInformation,
                       &HeapFragValue,
                       sizeof(HeapFragValue))
    )
    {
        printf("Success!n");
    }
    else printf ("Failure (%d)n", GetLastError());
}


Jim Dempsey

www.quickthreadprogramming.com

Thanks for evrybody's suggestions!

Jim, I totally agree with you about the assertion purpose and thank again for your new advice about MS heap funtions: I will look it around for the solution.

Om, I have already made a very little test but it can not reproduce the problem, I doubt it depends on the complexity of the allocation scheme in the program. Any way, I shall try it again if I can not find out the solution.

Because I do have other jobs to do for now, so I have just fixed this problem by a work-out : before allocate a big array, do a simple estimation with a loop of malloc/free... But surely I will come back to this problem and keep everybody knows.

Leave a Comment

Please sign in to add a comment. Not a member? Join today