Extended Eigenvalue Solver hanging

Extended Eigenvalue Solver hanging

Hello,

I'm trying to solve N independent generalized eigenvalue problems. The following piece of code hangs on my computer when launching mpirun -np N ./a.out with 1 < N < 9. I don't need the MPI version of the EES (and I know you don't support it as of update 2), just a SMP version that works independently on any random MPI process (I don't have such a problem with PARDISO for example). Can you reproduce this error ? Is there a way to fix this issue ?

Thank your for your help.

AllegatoDimensione
Download ees-hang.tar.gz332.39 KB
13 post / 0 nuovi
Ultimo contenuto
Per informazioni complete sulle ottimizzazioni del compilatore, consultare l'Avviso sull'ottimizzazione

Dear Customer,

what is the Link line that you are using? any specific machine that you are trying to run this program on? I'll reproduce the issue and get back to you soon

Thanks,

Sridevi

Sridevi Allam
Technical consulting engineer - Intel MKL

Thanks for your help, and here a my specs : Debian 3.2.35-2 x86_64 GNU/Linux, icpc version 13.1.0.146 Build 20130121, MPICH2 version 1.4.1, and finally, compile line is icpc FEAST_hang.cpp -I/usr/include/mpich2 -lmpi -lmkl_rt -lmkl_intel_thread -lmkl_mc  -lmkl_intel_lp64 -lmkl_core -liomp5 -lifcore -limf

Hello,

Can you reproduce the error ?

Could it be possible to know why nobody is answering please ?

Hello,

I'm using composer xe 2013 update 2 and MPICH2 latest version 3.0.3 and ran the following commands:

 icpc FEAST_hang.cpp -I/project/sallam1/mpich2/include -L/usr/lib64/openmpi/lib/libmpi.so.0 -lmkl_rt -lmkl_intel_thread -lmkl_mc -lmkl_intel_lp64 -lmkl_core -liomp5

It gave a warning: 

<<<

FEAST_hang.cpp(18): warning #592: variable "rank" is used before its value is set

oss << (int)rank;
>>>

when I ran: -bash-4.1$ mpirun -np 2 ./a.out, here is the Output:

GO ! 0
GO ! 0
Extended Eigensolvers: double precision driver
Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default
Extended Eigensolvers: double precision driver
Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default
Extended Eigensolvers: fpm(1)=1
Extended Eigensolvers: fpm(6)=1
Extended Eigensolvers: fpm(1)=1
Extended Eigensolvers: fpm(6)=1
Search interval [0.000000000000000e+00;4.000000000000000e-01]
Search interval [0.000000000000000e+00;4.000000000000000e-01]
Extended Eigensolvers: Size subspace 100
Extended Eigensolvers: Size subspace 100
#Loop | #Eig | Trace | Error-Trace | Max-Residual
#Loop | #Eig | Trace | Error-Trace | Max-Residual
0,26,5.344662201251952e+00,1.000000000000000e+00,4.611205669551193e-07
0,26,5.344662201251954e+00,1.000000000000000e+00,4.649064852068096e-07
1,26,5.344662201251753e+00,4.996003610813204e-13,1.378371549897360e-13
Extended Eigensolvers has successfully converged (to desired tolerance)
DONE ! 0
1,26,5.344662201251755e+00,4.973799150320701e-13,1.384700328307968e-13
Extended Eigensolvers has successfully converged (to desired tolerance)
DONE ! 0

I dont see a hang here. may be the versions of my builds causing difference?

Thanks,

Sridevi

Sridevi Allam
Technical consulting engineer - Intel MKL

Hello,

Thanks for your answer. The value of the variable "rank" is set line 15, so there is definetly a problem with your compiler output (icpc version 13.1.0.146 build 20130121 does not produce such warning). Moreover, at execution, it should read "GO ! 0" and "GO ! 1" .... "GO ! size - 1", not size - 1 times "GO ! 0". In my case, I only see "DONE ! 0" at the end, all rank other than the root hang.

Thanks in advance for your help.

Hello, Yes, you are right. The Testcase did hang for me too. Here is the output:

-bash-4.1$ mpirun -n 2 ./a.out
 GO ! 1
 GO ! 0
Extended Eigensolvers: double precision driver
Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default
Extended Eigensolvers: fpm(1)=1
Extended Eigensolvers: fpm(6)=1
Search interval [0.000000000000000e+00;4.000000000000000e-01]
Extended Eigensolvers: Size subspace 100
#Loop | #Eig  |    Trace     | Error-Trace |  Max-Residual
0,26,5.344662201251952e+00,1.000000000000000e+00,4.605558650066731e-07
1,26,5.344662201251757e+00,4.884981308350689e-13,1.379730242389445e-13
Extended Eigensolvers has successfully converged (to desired tolerance)
 DONE ! 0

It hangs after DONE! 0

I'm escalating this Issue to our engineering Team by submitting a ticket. I'll update you the status

Thank you,

Sridevi

Sridevi Allam
Technical consulting engineer - Intel MKL

Hi,

I reproduced hang on SMP version of your code. So it doesn’t depend on any MPI processes.

I found that the reason of hang is incorrect CSR format of the matrix B from file B_1.txt. If we look at B_1.txt we can see that array ib has equal values:

“1

1

2

4

4

4

6

6

…….”

Extended Eigenvalue Solver uses the same 3-array variation of the CSR format as in PARDISO (please see Intel MKL manual, appendix A).  Based on that format of ib array is incorrect.  Could you change format for B matrix and write about results?

Thanks,

Vitaly

Hello Vitaly,

Thanks to your remark, I just saw that it seems MKL does not support empty lines for symmetric CSR. I guess I have to add dummy 0s in the arrays then. Just out of curiosity, why ?

Hi,

I think there is some misunderstanding. MKL supports symmetric CSR format for empty rows. But for generalized problem Ax = λBx, B should be a real symmetric positive definite matrix (please see Intel MKL manual, Extended Eigensolver Functionality). It is known that real symmetric positive definite matrix should not contain zeros on diagonals (zero rows in your case). If possible you can change the input matrices B_i that they become a positive definite matrices.

 

Thanks,

Vitaly

Hello,

Thanks again for your answer. I'm almost positive I succesfuly solved SP undefinite generalized eigenvalue problem with FEAST RCI even if it is not covered in theory, because it only needs to factorize (zB - A), which in this case is SPD. Moreover, if you look at B_0.txt, there are also numeros empty lines, so that it is not SPD, but the method still converges. Any ideas ?

 Hi,

Thanks a lot good discussion about our EE solver. First of all I need to say that theoretically matrix B corresponds to some energy norm and based on this norm EE solver’s algorithm implemented. That’s the main reason why this matrix needs to be positive define. In such case MKL works correctly. We know that there is an issue with hanging EE solver in case of specific indefinite matrix B and are working to resolve it. Anyway if matrix B is positive define (based on MKL manual user needs to set B as positive define) we doesn’t see any hanging.

With best regards,

Vitaly Lukinov.

Lascia un commento

Eseguire l'accesso per aggiungere un commento. Non siete membri? Iscriviti oggi