xxmr2d:out of memory even with 64-bit libraries

xxmr2d:out of memory even with 64-bit libraries


I am writing a small fortran90 code to do (amongst othet things)
memory-distributed matrix matrix multiplication A*B on a computational cluster.

For this, I use ScaLAPACK + BLACS .. in the MKL libraries.
The code works fast and fine for matrices <2GB (<16384x16384)
When I try to use the 64-bit libraries something goes wrong.

The code compiles just fine with

MKLPATH = /site/VERSIONS/intel-11.1u7/mkl/lib/em64t

$(MKLPATH)/libmkl_scalapack_ilp64.a -Wl,--start-group $(MKLPATH)/libmkl_inte
l_ilp64.a $(MKLPATH)/libmkl_sequential.a $(MKLPATH)/libmkl_core.a $(MKLPATH)/libmkl_blacs_op
enmpi_ilp64.a -Wl,--end-group

and the ifort mpi compile wrapper


with the extra flags

-mcmodel=medium -i-dynamic -i8

During runtime I get an error from the pdgemr2d routine:

>>> xxmr2d:out of memory

Naturally, the same happens if I use the pdgeadd routine for
the task of block-cyclic distribution.

When browsing other forums I found this solution:
You need to modify the file REDIST/SRC/pgemraux.c and change
void *
unsigned int n;

void *
unsigned long int n;

I.e, a large enough workspace isn't allocated...
However, this is the exact meaning of using ilp64 libraries i guess?!

Could someone point me in the right direction or perhaps let me know what I am
doing wrong? I would really appreciate the help :) Oh, the multiplication routine
pdgemm works fine for large (>2gb) matrices, so the program really uses the
64-bit libraries.


! create the root-node context where the entire A and B matrices
! reside in memory, called gloA and gloB
call sl_init (rootNodeContext, 1, 1)
! prep the descriptors for A B and C
! the C descriptor is used later for moving
! the resulting C sub arrays back to the root node

if (Iam==0) then
nr_gloA_row = numroc( m, m, myrow, 0, nprow )
nr_gloB_row = numroc( k, k, myrow, 0, nprow )
nr_gloC_row = numroc( m, m, myrow, 0, nprow )
call descinit( desc_gloA, m, k, m, k, 0, 0, &
rootNodeContext, max(1, nr_gloA_row), info)
call descinit( desc_gloB, k, n, k, n, 0, 0, &
rootNodeContext, max(1, nr_gloB_row), info)
call descinit (desc_gloC, m, n, m, n, 0, 0, &
rootNodeContext, max(1, nr_gloC_row), info)

desc_gloA(1:9) = 0
desc_gloB(1:9) = 0
desc_gloC(1:9) = 0
desc_gloA(2) = -1
desc_gloB(2) = -1
desc_gloC(2) = -1
end if

call pdgemr2d( m, k, gloA, one, one, desc_gloA, locA, &
one, one, desc_locA, desc_locA( 2 ))
call pdgemr2d( k, n, gloB, one, one, desc_gloB, locB, &
one, one, desc_locB, desc_locB( 2 ))

All the best,


6 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Andreas,it looks like the defect in iLP64 implementation. We will check and let you know.--Gennady

Hi Gennady,

I was just wondering if you managed to reproduce my problem, and if so, did you find a fix to it?!

All the best,



I could not reproduce the described problem. All works fine with 17000x17000. Could you please provide test case?


which mpi library is this ?

32 bit integer or 64 bit integer ? OpenMPI, IntelMPI, MPICH2?

it sounds to be related to "Out of memory error with Cpzgemr2d" topic 509048

Best Regards

Thomas Kjaergaard





still have problems distributing a 43496x43496 matrix to the slaves using pdgemr2d it looks to be related to





will this be fixed soon?





Leave a Comment

Please sign in to add a comment. Not a member? Join today