integer overflow in dcopy

integer overflow in dcopy


I ran into a problem with using the dcopy subroutine of MKL. When compiling and running the included small test using gfortran and intel MKL version 11, it works fine on my Xeon machines, but fails on the Opteron machines. Both the integer*4 and integer*8 versions of the 64-bit libraries of MKL seem to have this problem. ACML has the same issue, but only in their integer*4 64-bit library. The test program works fine with the older version of MKL.

greetings, Steven

Downloadapplication/octet-stream test-dcopy.f684 bytes
16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Steven,
Which version of MKL do you use?
I guess is that this is linux OS?
How did you link the example?

I noticed the task size is very big:


Do you have enough RAM on the systems where do you see the problem?


Hi Gennady,

this is on GNU/Linux 3.2.0-23 x86_64 SMP, GNU C Library (Ubuntu EGLIBC 2.15-0ubuntu10) with 512 GB of RAM.
The test fails with MKL from composer_xe_2011_sp1.9.293 and composer_xe_2013.0.079, and only on opteron processors.

I know the test is quite large, the actual dcopy is in a quantum chemistry code. However, even a 32-bit integer dcopy should handle 1.2 billion elements since it's a 64-bit library, so size_t can hold the number of bytes without a problem, as it did in the older version. Since it only fails on opteron, I guess it's somewhere in an architecture-specific routine.

Thanks Steven. I am asking just to know what we need to check on our side. Yes, 1.2 10^9 should be handled by 32bit integer.
one more question - did you link with libmkl_sequential.a or libmkl_gnu_thread.a ?

I linked with libmkl_sequential.a. I just tried the libmkl_gnu_thread.a now and it fails too.

just for info: I am still couldn't find Opteron with such memory size.
the test passed on Xeon with 32 Gb od RAM.

thanks for the info,
this is also exactly what I get on Xeon E5630:


and this is what I get on Opteron 6276:

wrong, I = 146738497
X(I) = 3.141589835286140E-002
Y(I) = 1.12300002574921

Could you run the Opteron-specific code on a Xeon, or is that impossible?

FYI, this is the discussion about the same issue, but with ACML:

if necessary, I could try to find out how to give you access to one of our machines.

thanks for suggestion, it's not necessarily - I have already received the same results on AMD Opteron(tm) Processor 6282 SE with 32 Gb of RAM.
We will check what's wrong.


You wrote that this problem is on AMD machines with MKL and ACML. Could you check this test on Netlib?

-- Victor

Hi Victor,
with my system's blas (Ubuntu 12.04) it works fine, and I think that this is based on the netlib implementation. Also, our own program's blas is also based on netlib and that works fine too. (The older MKL 10.1 and ACML 4.2.0 work fine too). But if necessary, I can compile the netlib dcopy and test it.

The problem is reproduced even when only is used DCOPY.
I commented SQRT and DDOT and the problem is still exists:
wrong, I = 146738497
X(I) = 3.141589835286140E-002
Y(I) = 1.12300002574921

the problem is escalated.
we will let you know as soon as any update.

Hello Steven,

Would you please check the latest 11.0. update2? the problem has been fixed there. 



I've installed version 11.0 update 2 of MKL and compiled my test program, linking with either -lmkl_sequential or -lmkl_gnu_thread, but in both cases the program segfaults, output from gdb attached.


Downloadtext/plain gbd-out.txt1.98 KB

Hi Steve,

Looking at the gdb bt log you attched, I notice that you are using the MKL ILP64 interface library ( When using the ilp64 interface library, the integers declared in the source program should be of 64-bit integer type. With gfortran, the relevant compiler flag which make sures integers are 64-bit length is -fdefault-integer-8. For Intel Fortran compiler, the correct option is -i8

Can you confirm that you are using -fdefault-integer-8 when compiling your program with gfortran and linking against the MKL gfortran ilp64 interface library?


Hi Vamsi,

thanks for catching that, I was too quick to test and indeed forgot that flag. Everything seems to work fine now, also in production the results match the Xeon machines. Thanks to everyone for your help.



Leave a Comment

Please sign in to add a comment. Not a member? Join today