heap_arrays and slow code

heap_arrays and slow code

Hello everyone,

I have a fortran DLL, called by a vb.net program, in which I perform some matrix calculations. I have to used the /heap_arrays flag because I get stack overflow when running. Now the code runs fine but very very slow. I added the flag "/check:arg_temp_created" to the compiler to see if it is any non-contigous array, and it tells me that there's an array Kred which is "temporary". This array has been declared as:

real*8,allocatable,dimension(:,:) :: Kred

and, after it has been allocated and filled, it is passed to the MKL routine DSYSV:

CALL DSYSV('Lower', red, 1, -Kred, red, pivot, Fred, red, WORK, LWORK,IERR)

The question is: why this warning just on MKL? because it is just passed in a call?
And second: how can I speed the code? Basically, I cannot modify some arrays copies inside the code.

The DLL is compiled in "Release", I'm using intel compiler XE in VS 2008, project in 32-bit, Windows 7 64bit.



7 帖子 / 0 全新

But you aren't passing the array Kred - you're passing the expresion -Kred, which requires a temporary array to be created and passed. This is what the warning is about. It isn't just non-contiguous arrays.

If you really did mean to pass the expression, there's no way to speed that up - the work has to be done.

Steve - Intel Developer Support

Thank you, it was pretty obvious, but I didn't understand what "/check:arg_temp_created" really means. By adding the line:


before calling DSYSV there are no warnings now.

I read in other topics in this forum (like this one: http://software.intel.com/en-us/forums/topic/276430) that /heap_arrays will slow the code. I'm experiencung a serious slowness of execution, and it is becaming a problem for me: as far as I know, Fortran language should generate faster code than other languages, but in this case it seems that some matrix calculations were performed faster in vb.net rather than in Fortran.

In my code I use a lot of times the instructions TRANSPOSE, PACK and RESHAPE, and they seem to be the real bottle-neck. Can I do something to speed up the code?


In my experience, TRANSPOSE is quite efficient (in small enough cases) in ifort and other Fortran compilers.   It has evident problems with cache as the operand size is expanded beyond where a single thread can be efficient.  Compilers attempt to optimize cases where transpose is combined with MATMUL.  You can do this yourself by calling ?gemm and the like using the internal transpose.  When MKL needs a transpose in ?gemm, it appears to perform it efficiently.  I haven't tested how combinations of transpose and matmul play with the opt-matmul option, but that option has improved in the recent versions of ifort which implement it (I assume you use such a version).  If you have matmul cases large enough to benefit from mkl_thread, you should be using  opt-matmul or calling ?gemm directly.  For smaller cases with matmul expanded in line, -O3 is important, as is the best setting of /arch:.

In my tests, current ifort implements pack as efficiently as any alternative, but it's not a particularly fast operation.

I've avoided reshape, even using legacy tricks such as equivalence.  The same thing might be accomplished with Fortran pointers, in case that makes you feel better about it.  Without an example, I don't know how much assistance you could get.  If your usage of reshape causes the compiler to miss obvious optimizations, it's all the more important to present an example where the Intel compiler team has an opportunity to evaluate it.

I don't agree that /heap-arrays slows down code in any meaningful way in the vast majority of applications, and it makes the difference between running and not running.

Steve - Intel Developer Support

Yes, finally I has to agree with Steve, because thank to TimP my code has speeded-up using DGEMM.

Just a last question: when even /heap_arrays directive is insufficient and there's no enough memory to proceed, what is the best way to reduce the memory occupation of the matrices I use, which are all symmetric and banded? Are these possibilities in MKL?

Thanks to all,


MKL has fairly comprehensive support for banded and sparse storage symmetric matrices.  You would get a big boost from using these where they apply.  Any search engine "MKL banded matrix" "MKL symmetric matrix" will tell you more than can be covered here.