memory issues - large arrays, delete, stacksize

memory issues - large arrays, delete, stacksize

Hi,

I updated my Composer from composer_xe_2011_sp1.7 to sp1.9, and my working code stopped working due to memory issues.

The first errors occured when deleting large arrays (for CRS-stored matrices). The delete[]-command caused the error:
if (values) delete []values; values = NULL;
(I always NULL my deletes pointers/arrays.)

Playing around with ulimit (Stacksize) and KMP_STACKSIZE did not help, but moved the error from my own routine to some mkl-subroutine:

0x00002aaaafc560a4 in mkl_spblas_lp64_dcsr0tg__c__mvout_par () from /opt/common/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64/libmkl_mc3.so

Unfortunatly, I cannot provide a "minimal working example" of this problem.
Any ideas? Or shall I switch back to sp1.7? Btw, sp1.8. does not work, too. Everytime I try a new version, I get new problems, usually somehow related to PARDISO...

Somewhere I read that in a threated enviroment, sometimes releasing (shared) memory is a problem. I remove ALL openmp-clauses and "omp.h" and those compiler-flags. No change.

Intel-Compiler Version 1210, Build-Date 20120212, kompatibel zu GNU-Compiler Version 4.5.2
Intel Math Kernel Library Version 10.3.9 Product Build 20120131 for Intel 64 architecture applications
AVX-optimizations : enabled.
Processor optimization : Intel Core i7 Processor

Any idea is appreciated!

16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

What you are observing is typical of the corruption of heap allocation header(s) of the array "values" (and/or the objects/arrays deleted in dtors in the array of objects in "values"). You do not delete stack variables.

This assumes values was properly allocated (as opposed to uninitialized junk in the pointer).

Try compiling with subscript out of bounds (and uninitialized variable) runtime checks enabled. If that doesn't expose anything, then try valgrind or something equivilent.

Jim Dempsey

www.quickthreadprogramming.com

Quoting fabi.k...
The delete[]-command caused the error:

if (values) delete []values; values = NULL;

Any idea is appreciated!

Two possible reasons are as follows:

1. The variable/member 'values' is already released
2.A memory corruption happened before ( I agree with Jim )

A releaselike thisis better:

...
if( pSomeData != NULL )
{
delete [] pSomeData;
pSomeData = NULL;
}
...

Hello,

as Jim and Sergey already mentioned it high likely seems to be a dangling pointer issue that might have been there for quite some time. A small change in the build system unveiled it finally.

I'm not excluding other root causes but it's better to analyze invalid pointers first.
Hence I'd recommend to use Intel Inspector XE 2011 and start a memory analysis. Afterwards, or alternatively, you can manually debug into this problem using Intel Debugger (IDB) or GDB.
In other cases it also helps to reduce the problem to a smaller reproducer.

Best regards,

Georg Zitzlsberger

Thank you very much for your suggestions and help, I will try and report here later.

My compiler warnings settings are VERY pedantic, in fact I enabled almost everything possible... in some older versions I even got warnings in your own MKL-headers ;-)

CFLAGS_ICPC12_WARNINGS = -w2 -Wall -Wcheck -Wabi -Wcomment -Wdeprecated -Wformat -Wformat-security -Wmain -Wmissing-declarations -Wmissing-prototypes -Wnon-virtual-dtor -Wpointer-arith -Wremarks -Wreturn-type -Wreorder -Wshadow -Wstrict-aliasing -Wstrict-prototypes -Wsign-compare -Wtrigraphs -Wuninitialized -Wunused-function -Wunused-variable -Wwrite-strings -std=c++0x

Usually I'm very disciplined on uninitalized pointers and stuff, and valgrind did not find any "related" memory leaks so far. I will check out this Intel Inspector XE 2011 thing, but I cannot imagine it will show more that valgrind.

Quoting fabi.k...
My compiler warnings settings are VERY pedantic, in fact I enabled almost everything possible... in some older versions I even got warnings in your own MKL-headers ;-)
...

Even if many warnings are enabled it doesn't eliminate or detect a logical error and, as a result, a crash in an application.

>>I always NULL my deletes pointers/arrays.

And what about your uninitialized pointers/arrays?

And, when pointers/arrays not NULL, are you making an incorrect assumption as to the size of the allocation(s)?

Jim Dempsey

www.quickthreadprogramming.com

@Sergey: thx, but I'm aware of that.

Uninitialized pointers in the code are - imho - not the problem. Things go wrong when I start using PARDISO for the second time. Without this, everything is fine.

The error is as follow:

- (huge) memory allocation (pointer=new...)
- (huge) memory release (delete[] and pointer=NULL)
- (huge) memory allocation (pointer=new...)
- (huge) memory release (delete[] and pointer=NULL)
...
- (huge) memory allocation (pointer=new...)
- PARDISO
- (huge) memory release (delete[]... *error*)
7ffff6adf000-7ffff6cdf000 ---p 00d03000 00:18 61539763

/opt/common/intel/composer_xe_2011_sp1.9.293/mkl/lib/intel64/libmkl_intel_thread.so

The pointers are private members in some other class, which is NOT connected to the PARDISO at all - or should not. Of course I'm aware there could be logical errors, but I wouldn't ask here if I had not already spent days on resolving these.

@Jim:
What about that heap allocation thing?
This memory allocation/release is in a method and repeated for a couple of times, before PARDISO starts. But just in a method, no objects are deleted at this time.

The problem seems to be MKL-10.3.9-related, since the g++-Compiler and Intel-Compiler Version 1210, Build-Date 20120212 also fails. Using MKL 10.3.7 (instead of 10.3.9 oder 10.3.8), everything is fine.

Btw, the error does not occur (neither MKL 10.3.7, 8 nor 9) if I use

mkl_set_num_threads(1);

at the beginning.
OpenMP is not used (at least not by me, but i guess it is somehow used inside the MKL).

Doesn't that support my idea of "maybe somethings wrong in the MKL?".

Hello,

yes things seem to turn out against MKL.
Would it be possible to provide a small reproducer? I'm aware that it means some (big) work on your side but otherwise we're searching the needle in the haystack. I'm highly appreciating your efforts!

Thank you & best regards,

Georg Zitzlsberger

Hello Georg,

this is really a lot of work - and a first cut&paste-code to implement the idea from above does not reproduce the error. I don't think I can provide a small reproducer, it would take days or not be small and I don't want to give away our code.

I'm switching back to 10.3.7 and hope this works me.

Btw, it would be really nice to have PARDISO like pardiso(pt, .... blah blah..., const pointerE, const values, const input, output).
"const" is really helpful tool to avoid logical errors.

Best regards,
Fabian

Quoting fabi.k...The first errors occured when deleting large arrays (for CRS-stored matrices). The delete[]-command caused the error:

if (values) delete [] values; values = NULL;

...

Did you try to comment a 'delete [] ...' part(s) of your code? If Yes, did you have any errors?

Best regards,
Sergey

void MatrixCRS::reallocateMemory(const int _newDim, const int _newNonZeros) {
 /* if (values!=NULL) delete[] values; values = NULL;

  if (columns!=NULL) delete[] columns; columns = NULL;

  if (pointerB3!=NULL) delete[] pointerB3; pointerB3 = NULL;

  if (pointerE!=NULL) delete[] pointerE; pointerE = NULL;*/
  const long needed = (sizeof(REAL)+sizeof(MKL_INT))*_newNonZeros + 2*sizeof(MKL_INT)*_newDim;

  if (memcheck && !System::checkRAM(needed)) { cout << _MEMORYFEHLER << " name=" << getName() << endl; exit(EXIT_FAILURE);}

  try {

    values = new REAL[_newNonZeros];

    columns = new MKL_INT[_newNonZeros];

    pointerB3 = new MKL_INT[_newDim+1];

    pointerE = new MKL_INT[_newDim];

  }

  catch (exception& e) { cout << _CATCHIT(e) << "name=" << getName() << ", _newNonZeros=" << _newNonZeros << ", _newDim=" << _newDim << endl; throw; }
  nonZeros = _newNonZeros;

}

Like that?

First of all, it works (or at least the error has not occured yet).

But: ??? Isn't NOT freeing allocated memory one of the DON'TS of C++? Of course I can try to mimize reallocation, for performance reasons. But shouldn't the upper example work with the deletes? And, with ALL versions of the MKL, not only <10.3.8.?

I'm not implementing vital ISS-software, but it would be nice to know that the upper code block does not affect other parts of my program - or is itself affected by some spacy >10.3.7.-MKL/OMP-subroutines...

Thx for helping me out here.

(btw: Ubuntu 11.04, 24x Xeon X5660, 48 GB mem, 10% of mem in usage during typical computation)

Hello Fabian,

even though you might be aware of this already I'd like to mention it here for completeness:

IntelMKLMemoryManagementSoftware
IntelMKLhasmemorymanagementsoftwarethatcontrolsmemorybuffersfortheusebythelibraryfunctions.
NewbuffersthatthelibraryallocateswhenyourapplicationcallsIntelMKLarenotdeallocateduntiltheprogram
ends.Togettheamountofmemoryallocatedbythememorymanagementsoftware,callthemkl_mem_stat()
function.Ifyourprogramneedstofreememory,callmkl_free_buffers().Ifanothercallismadetoalibrary
functionthatneedsamemorybuffer,thememorymanageragainallocatesthebuffersandtheyagainremain
allocateduntileithertheprogramendsortheprogramdeallocatesthememory.Thisbehaviorfacilitatesbetter
performance.However,sometoolsmayreportthisbehaviorasamemoryleak.
Thememorymanagementsoftwareisturnedonbydefault.Toturnitoff,settheMKL_DISABLE_FAST_MM
environmentvariabletoanyvalueorcallthemkl_disable_fast_mm()function.Beawarethatthischangemay
negativelyimpactperformanceofsomeIntelMKLroutines,especiallyforsmallproblemsizes.

(from the Intel Math Kernel Library for Linux* OS users guide for 10.3.9)

Does it make sense for your example to call "mkl_free_buffers()" before deleting the arrays? Also, just for testing, do you see a change when setting $MKL_DISABLE_FAST_MM?

Best regards,

Georg Zitzlsberger

>>...
>>Like that?

Yes, and the purpose of that test isto verify that there are no problems in another parts of your codes.

>>First of all, it works (or at least the error has not occured yet).

It seems to me that as soon as these pointers passed toMKL functions you are no longer
responsible for releasing them. Almost the same approach is used in COM programming.

>>But: ??? Isn't NOT freeing allocated memory one of the DON'TS of C++?

No. Of course the memory must be released. The question is who is responsible for this.

>>But shouldn't the upper example work with the deletes?

Yes, it should work if you don't use any MKL functions and don't pass any pointers with already allocated
memory to any MKL functions.

Disable Intel MM via mkl_disable_fast_mm() does not help, but moves the error to a MKL-multiplication routine. Thank for that hint, anyway! Good to know, for debugging.

>> It seems to me that as soon as these pointers passed toMKL functions you are no longer
>> responsible for releasing them. Almost the same approach is used in COM programming.

So, when I used my pointerE/pointerB3/etc-arrays in any (or some) MKL functions, somebody takes care of releasing MY memory, but does not ask me WHEN this should happen? Did I get that right? (if yes, is there any documentation about that? or is it that snipped about mkl_disable_fast_mm()?)

Btw, the error (or its pseudo-random behaviour) is NOT restricted to the machine I'm using, but to MKL 10.3.8, and 10.3.9.

Leave a Comment

Please sign in to add a comment. Not a member? Join today