zgesvd in multithread MKL 10.2.5.035 produce wrong result

zgesvd in multithread MKL 10.2.5.035 produce wrong result

Hello,

I probably met a bug in one of the MKL (10.2.5.035)
subroutine, ZGESVD. I am linking to the multithreaded version of MKL
statically. The ZGESVD give me wrong results when I use 32 threads. It gives me
correct result if I use one thread. I have a simple non-openmp program that
loads in a matrix and carries out the SVD operation to produce this error
consistently. The test program, makefile and the data file are all in the attachment.

The test program shows that the first svd call produce correct result. The second svd call is to find out the optimal work size. The third call produce wrong result. This may indicate that the work size is giving the problem. However, wrong results will be produce for other matrices even for the first svd call.

When I use less threads, for example, 16 or 8 or 1, the result is correct for the matrix attached.

This test is made on a Linux node with 32 cores.

Any suggestions?

Thanks & Regards,

Xin

AttachmentSize
Downloadapplication/zip SVD_error.zip784.68 KB
16 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Xin,Could you please check if the problem persists with the version 10.2 Update7?--Gennady

Gennady,

Thanks for your reply!

I need to get our system administrator's assistance in order to try a different version of MKL. 10.2.5.035 is the latest one installed for now. They are planning to install 10.3 some time this week.

By the way, I downloaded an evaluation version of MKL 10.3 for Linux and tried to install as a user (not root). But the installation seems stalled after EULA appeared and I typed 'accept' and enter. Any ideas on this problem? I can try 10.3 right away if I can install it successfully.

Regards,
Xin

Xin,I have no idea regarding any installation issue (:-. I will ask to help the Install Engineer to help You with this problem.
--Gennady

Hello Xin,We reproduced the problem with the all latest versions including 10.3.x. That issues caused by internal threading. We will provide the update of the issue to you when the fix will availble.Regards, Gennady

Hello Xin,

The installation issue you have described looksan activation problem. The installer reads all Intel license keys registered on your system to make a decision about your current activation level. In case your system (or shared location) has significant amountof licenses this process could take some time.

Could you kindly start installation one more time and let it scan your system for a long time period, like a 30-40 minutes please? If it still freezes please interrupt it and, if possible, send us a log files /tmp/*.issa*.log and /tmp/*.pset*.log (please select correct one sorting by modification time)

We will investigate the issue and return to you with instructions.

As a temporary workaround you could try to backup and then cleanup folders /opt/intel/licenses and $HOME/intel/licenses (please ask for a root assistance if you have no enough permissions) and restart the installation.

Waiting for your reply.
Thank you,
- Nikolay

Hello Nikolay,

Thanks for your reply! I started the installation yesterday and it is still freezing there. Since Gennady has tested with all the new versions of MKL, I am not going test it again by myself for now. And, our system administrator is going to install the latest MKL, I will just wait for that.

However, I am interested to know what the problem is in case I need to install again. Attached are the log files. There are a few similar log files possibly for my other attempts.

Gennady, Thanks for your info! Hope the fix is a simple one and comes soon.

Thanks all!

Xin

Attachments: 

Hello Xin,

Thank you for the information.
The log file shows the freeze at the activation checking, so the initial presumption was correct.

Could you kindly call three commands at you system and send the output please?

1) ldd"/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic"

2) ls ls /share/apps/intel/ict/Compiler/11.1/072/licenses

3) export LD_LIBRARY_PATH=/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e; "/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic" -f"MKernL" -f"MKern" -p"i86_r" -p"i86_re" -p"it64_lr" -p"it64_re" -p"amd64_re" -c"/share/apps/intel/ict/Compiler/11.1/072/licenses"

Thank you very much for your time,

- Nikolay

Hello, Nikolay,

Here is the output:

[xxu2@dlxlogin2 ~]$ ldd "/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic"
libpthread.so.0 => /lib64/libpthread.so.0 (0x0000003734a00000)
libm.so.6 => /lib64/libm.so.6 (0x0000003734200000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x0000003740c00000)
libc.so.6 => /lib64/libc.so.6 (0x0000003733e00000)
libdl.so.2 => /lib64/libdl.so.2 (0x0000003734600000)
/lib64/ld-linux-x86-64.so.2 (0x0000003733a00000)
[xxu2@dlxlogin2 ~]$ ls -ls /share/apps/intel/ict/Compiler/11.1/072/licenses
16 -rw-r--r-- 1 root root 551 Jul 21 2010 /share/apps/intel/ict/Compiler/11.1/072/licenses
[xxu2@dlxlogin2 ~]$ export LD_LIBRARY_PATH=/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e; "/home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic/32e/chklic" -f"MKernL" -f"MKern" -p"i86_r" -p"i86_re" -p"it64_lr" -p"it64_re" -p"amd64_re" -c"/share/apps/intel/ict/Compiler/11.1/072/licenses"
-bash: /home/xxu2/src/mfd/MFD3/MFD/Make/test/l_mkl_10.3.2.137_intel64/./pset/32e/../chklic/32e/chklic/32e/chklic: Not a directory
[xxu2@dlxlogin2 ~]$

Regards,
Xin

Hello Gennady,

I am wondering whether there are more info about this bug which is related to the internal threading. My main concern is whether the same threading bug resides in the other subroutines such as matrix multipy, qr, lu and matrix inverse. My work relies on these libraries and I do observe strange result when I use SVD with one thread and more threads for others.

I would like to know whether I should avoid using the treaded library for now or what the known safe number of threads to use is for the MKL library.

Any information would be appreciated!

Regards,

Xin

Hi Xin,We don't expect the problem expect the *svd routines which you've already reported.In the case you have others problems, please let us know.--Gennady

Hi Xin,Could You please check if this problem with the latest 10.3 Update3 and let us know if any further problem?10.3.3 has been released yesterday and available at Intel Registration Center./gf

Hello, Gennady,

Thanks for the info. I will try it as soon as I get one installed on our cluster. It could have been much easier if I can install an evaluation version in my own directory. Unfortunately, the installation issue I brought up in the last few messages have not been solved.

Regards
Xin

Hello Xin,

Unfortunately, we are still trying to figure out the root cause of the activation problem.

Did you try to use this workarond:
As a temporary workaround you could try to backup and then cleanup folders /opt/intel/licenses and $HOME/intel/licenses (please ask for a root assistance if you have no enough permissions) and restart the installation.

If it is also unsuccessful please try following steps:
1) Go to /rpms
2) Invoke command: #> rpm -ivh --nodeps --ignorearch --prefix "location for installation" *.rpm

I'm monitoring this topic, so pleasecontact meif you have any questions.

Thank you,
- Nikolay

Hello, Nikolay,

Thanks for your reply!

I went to check the folder /opt/intel/... But there is no /intel/ folder under /opt. The folder $HOME/intel/licenses is empty.

I tried the second way by invoke the command 'rpm ...'. I got the following message:

error: can't create transaction lock on /var/lib/rpm/__db.000

I guess it is the permission issue. I am sending your sugesstions to our system administrator.

Thanks!
Xin

Hello Gennady,

MKL10.3 Update 3 produced correct results for the small test case that I posted here. I will run some more cases with it. Hope it works good!

Thanks very much!

Xin

Leave a Comment

Please sign in to add a comment. Not a member? Join today