bug in MKL 11.0 LAPACK function 'gels '

bug in MKL 11.0 LAPACK function 'gels '

I am attaching a minimal Fortran program that links to the Intel MKL to make use of the LAPACK function gels, which solves a linear least squares problem.

When I compile and run this program with Intel Fortran Composer XE 2013, the output is totally wrong. Compiling and running with Intel Fortran Composer XE 2011 gives the correct results. The output from both is attached.

The very strange thing is that the results are correct on both compilers if you change line 3 of the code to nt = 513 or any smaller number, whereas they stay incorrect for any nt >= 514. This is not due to weird values - the values being read in are all real, finite numbers.

The output also shows the compilation and linking lines (I used the most recent MKL link line helper). There you see the exact version numbers of the compilers. I should note that the Composer XE 2013 runs on Ubuntu 12.04, whereas the Composer XE 2011 runs on CentOS 6 (and used to run on Ubuntu 10.10, also giving the correct results). So it could be an issue with Ubuntu 12.04.

Let me know if I can be of any help.

EDIT: If you call gelsy instead of gels, then the results are correct. I updated the main.f90, so there is a logical 'use_gelsy_instead', and also it now outputs the info flag of the gels command.



Downloadtext/plain output-wrong.txt780 bytes
Downloadapplication/octet-stream main.f90990 bytes
Downloadtext/plain series.txt48.83 KB
Downloadtext/plain output-correct.txt916 bytes
13 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

the same results I see on Windows 64 bit with the latest 11.0 ( bundled with Composer 2013).
That's not clear to me how did you decided that that result is wrong ( -39507.9652423625 -20829.7946097958 ) ?

in the last 2 years I ran this code on various machines / architectures, so I immediately see it is wrong. Here is an intuitive way to see it: series.txt contains data points that lie (strictly) between 0.0 and 20.0 (if I remember the upper bound correctly). The result can be interpreted as the coefficients of a statistical Least Squares regression - correct me if I am wrong. They cannot be all negative and of this large, absolute value, if both the dependent and independent variables lie in (0.0, 20.0), because then the estimated/forecast y-values would always be negative.

Ok, thanks for clarification.
We will check the code and let you know the results.

Daniel,yes, this the problem.
We confirmed the issue. the issue is escalated. We will let you know if any updates.

I can reproduce the errors on our cluster (running CentOS 6), where I can load Intel Composer XE 2012 or 2013, each with corresponding MKL. So this is not an issue with Ubuntu or with my machine.


yes, please let me know of updates, then I will try to test as quick as possible.

MKL 11.0 update 1 just came out. Is this issue fixed? I have to know before I request to update our cluster... If it is not fixed, could you provide me with a way to track the bug?

no, the fix of the problem is not available in update1. The fix is targeted to the next update ( 11.0 update 2).

As I just found out, the problem doesn't appear when I link to a 'parallel' version. Specifically, in my original program (not the toy program posted here), the following link options do create the error:

-lmkl_intel_lp64 -lmkl_sequential -lpthread -lm -lmkl_core -L/usr/local/intel/composer_xe_2013.0.079/mkl/lib/intel64 /usr/local/intel/composer_xe_2013.0.079/mkl/lib/intel64/libmkl_lapack95_lp64.a

The next link line seems to work fine, the error specified above does not occur:

-lmkl_intel_lp64 -lmkl_intel_thread -liomp5 -lpthread -lm -lmkl_core -L/usr/local/intel/composer_xe_2013.0.079/mkl/lib/intel64 /usr/local/intel/composer_xe_2013.0.079/mkl/lib/intel64/libmkl_lapack95_lp64.a

that's correct :the issue happens only for sequential code (libmkl_sequential.a lib to use). Threaded version is OK.

update 2 just came out, but I can't find this on the fixes list. Is it fixed? I want to know before I push our support to install it on the cluster. Thanks.

yes, the problem has been fixed in 11.0 update 2. please check and let us know the result.

Leave a Comment

Please sign in to add a comment. Not a member? Join today