issue using FEAST with high-dimensional manifolds and OMP

issue using FEAST with high-dimensional manifolds and OMP

Portrait de Gagan

whats good guys.

problem: i am giving an N x N sparse matrix as input to FEAST and I am trying to solve for d-dimensional manifolds.

Note that this dimensionality, d, is independent of N.

Now when d isn't too large; say <3000, FEAST works completely fine.

However, if I "up" the value of d to say, 9000, i get errors such as:

Intel MKL Extended Eigensolvers: Size subspace 9001

#Loop | #Eig  |    Trace     | Error-Trace |  Max-Residual

OMP: Error #34: System unable to allocate necessary resources for OMP thread:

OMP: System error #35: Resource temporarily unavailable

OMP: Hint: Try decreasing the value of OMP_NUM_THREADS.

Abort trap: 6

I have tried setting export OMP_STACKSIZE=1024m, and I have also tried using the -stack_size argument in clang++ to specify a stacksize that's pretty large. Neither of these solutions worked. I have also looked at how much memory is being consumed when I run FEAST as above with either settings and the limit was roughly the same. It suggests to me that I'm not setting OMP_stacksize right, or maybe OSX does this differently?

Using clang++ with c++11 threading, and also the MKL for all the math. any assistance on this situation would be great.

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington
25 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Hi Gagan,

Could it because too many nested threading in your application?  Could you please try to export OMP_NUM_THREADS=1  [/2/4/*] and export KMP_AFFINITY=verbose and see how many openMP threads was created?

Best Regards,
Ying

Portrait de Gagan

hi Ying,

i've tried both of your suggestions, however neither solved the problem.

when I set the KMP_AFFINITY=verbose, it didn't seem to print additional output. 

i'll look in the mkl libraries to see if there is a way to set this flag internally, but setting OMP_NUM_THREADS did not solve the issue.

hope this is helpful

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington

Hi Gagan,

Do you have some details about your hardware and OS, including how do you link MKL?

Or a reproduce test case may be helpful.

Best Regards,

Ying

 

Portrait de Gagan

hey man,

sure i'll dump the matrix and row/column indices to three separate files and write a testcase.

hopefully i'll get around to it later today and have it done by the night.

thanks again for your prompt assistance,

gagan

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington
Portrait de Gagan

thought there might be an issue with attachment size on a forum post, but_of course_ intel's forum is equipped for larger attachments :P

haha, awesome... BUT I DIGRESS.....

attached is a testcase. i compiled it on a Mac Pro using clang (cuz icpc isn't ready for 10.9 yet) with:

clang++-g -stdlib=libc++ -std=c++11  -O3  -fimf-arch-consistency=true -vec-guard-write -no-ftz -opt-mem-layout-trans=3 -ansi-alias -fPIC -funroll-all-loops -ipo -mtune=native -o testCase test.c -L/usr/lib -L/usr/local/lib -L/opt/intel/composerxe/mkl/lib -lmkl_intel_lp64 -lmkl_core -lpthread -lznz -lm -liomp5 -lmkl_intel_thread -lz 

to execute:

success:./testcase 900 (works up to 2000ish).

failure: ./testCase 9000 

this issue arises for any data of the form given above; i am hopeful this will help you guys.

I am using FEAST to provide a quantum mechanical solution to a deterministic theory involving geometry; in *theory* a large d should improve the quality of the manifolds returned by FEAST.

NOTE: i am not expecting software like this to be perfect by _any_ means, it is very new. i am just explaining what is going on and why i am trying to abuse your fantastic solver for more dimensions. 

o, and...

much love for the premium membership upgrade thing (i love u intel, i want custom hw soon so be ready!!), i hope this data/testcase is fruitful for future releases :) 

Fichiers joints: 

Fichier attachéTaille
Télécharger im.txt232.26 Ko
Télécharger jm.txt20.33 Mo
Télécharger m.txt38.29 Mo
Télécharger test.c2.98 Ko
"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington

Hi Gagan,

Thanks. I can run the code, but the code seem hang when enter 9000.  We will investigate it and keey you update.

Best Regards,
Ying

Hi Gagan,

We found the issue may be same as the one in http://software.intel.com/en-us/forums/topic/472477.  but for Mac OS.   Could you please let us know if it is urgent for you to get fix or is it ok for wait for next release (in 2-3 month)?

Best Regards,

Ying

 

Portrait de Gagan

hi,

the higher dimensions would be very helpful. it seems this sort of issue is on dr polizzi's end then?

while it's not *urgent* by my definition, could you please see if he has planned a fix any time soon? 2-3 months is a very long time and i was hoping more that it wouldn't take more than a month or so?

i understand that you guys usually roll out big updates, so if you could provide a hotfix/workaround whenever you solve this issue, that would be good. i will just implement those fixes into the mkl code manually to tide me over until an official release is given.

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington
Portrait de Gagan

btw i'm running the dense version now and it's at 47.5gb and hasn't given the OMP issue.

when you referred to hanging running the test-case above, what did you mean? did you mean you got the OMP issue and it crashed? The reason I ask is because this solver should take a while for d=9000. maybe if you weren't getting the OMP_issue in sparse mode I was doing something wrong.

i will let you know if the dense version completes or errors out; right now it seems like it is actually working (it should take a while to find 9000 dimensions heh). will keep you posted.

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington
Best Reply

Hi Gagan,

Thanks for your explanation.  Right, after several hours later, i got same OMP error as your reported, so your code haven't problem.  There is a bug the function. I send your a private message for the fix.   

Best Regards,
Ying

Portrait de Gagan

hey ying,

GagansMacPro-2:tester Gagan$ ./testCase 9000

Intel MKL Extended Eigensolvers: double precision driver

Intel MKL Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default

Intel MKL Extended Eigensolvers: fpm(1)=1

Intel MKL Extended Eigensolvers: fpm(2)=12

Intel MKL Extended Eigensolvers: fpm(4)=100

Intel MKL Extended Eigensolvers: fpm(5)=1

Intel MKL Extended Eigensolvers: fpm(6)=1

Search interval [0.000000000000000e+00;1.000000000000000e+35]

Intel MKL Extended Eigensolvers: Size subspace 9001

#Loop | #Eig  |    Trace     | Error-Trace |  Max-Residual

0,9000,4.500986238458544e+05,1.000000000000000e+00,2.159098422692203e-34

WOOP WOOP

thanks homie!!

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington

It is great to know it works!

Thanks

Ying

please check the official fix of the problem with the latest update 1 ( MKL v.11.1 Update 1) released the last Friday and let us know the results.

Portrait de Gagan

what's good.

Writing sparse matrix to files...III. Computing the d-dimensional manifolds (Eigensolver)...

Intel MKL Extended Eigensolvers: double precision driver

Intel MKL Extended Eigensolvers: List of input parameters fpm(1:64)-- if different from default

Intel MKL Extended Eigensolvers: fpm(1)=1

Intel MKL Extended Eigensolvers: fpm(2)=12

Intel MKL Extended Eigensolvers: fpm(4)=100

Intel MKL Extended Eigensolvers: fpm(5)=1

Intel MKL Extended Eigensolvers: fpm(6)=1

Search interval [0.000000000000000e+00;1.000000000000000e+15]

Intel MKL Extended Eigensolvers: Size subspace 10001

#Loop | #Eig  |    Trace     | Error-Trace |  Max-Residual

0,10001,2.460420846718340e+06,1.000000000000000e+00,1.026947599855742e-13

rest assured, clang++ knows how icpc's ass tastes: https://twitter.com/i3roly/status/395951193559547904/photo/1 ;)

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington
Portrait de Gagan

yo guys,

running into another issue involving large callocs.

i've attached the updated matrices that replace the ones i attached above.

if you run this program as ./testCase 34830 the program segfaults at the allocation of the variable "output".

my question is, why? both calloc and malloc don't work. do i need a larger stacksize? this is for sure an issue with allocating a very *large* chunk of memory at once, but it is *imperative* that this occurs.

for calloc i get a segmentation fault: 11, for malloc i get:

reducefMRI(39024,0x7fff79900310) malloc: *** mach_vm_map(size=18446744056558313472) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

any suggestions? i tried changing the compile from lp64 to ilp64 but it doesn't seem to make any difference. 

to be precise, the array i'm trying to allocate has 76123*34830 elements so i don't think it's the ilp64/lp64.

thx

Fichiers joints: 

Fichier attachéTaille
Télécharger im.txt585.59 Ko
Télécharger jm.txt52.39 Mo
Télécharger m.txt95.08 Mo
"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington

Hi Gagan,

Our developer help to investigate your question, please see his reply:   

On first sight,  test.c produces overflow on line 45 for lp64 interface:

“output = (double*) calloc(nrRows*(d+1),sizeof(double));”

In your example nrRows*(d+1) == 2651364090 > 2147483647 (2^31-1) == maximum of  int for lp64.  Output array takes approximately 21Gb of RAM.

 To avoid overflow  I used size variable :  

“long int size=0;

size = d+1;

size= size * nrRows;

output = (double*) calloc(size,sizeof(double));”

But I got -1 error in EE solver with lp64 interface.

So I recommend to use ilp64 interface and to replace all int -> MKL_INT in test.c.    ( or maybe compiler option -i8)

But I haven’t enough memory for ilp64 interface on machine with 33+ Gb of RAM.

Could you tell how many  available RAM on your machine?

Best Regards,

Ying

Portrait de Gagan

hey man, 

in the midst of trying these suggestions out and i realized that dlange is bugging out when I use MKL_INT.

furthermore, LAPACKE_dlange doesn't exist in the MKL library for some reason. i would really like to use this function as it's crucial, and i realized now that what is stopping me from using ilp64 is this function not working. can we get this fixed??

thx

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington
Portrait de Gagan

update, i tried both dlansy and dlange and they do not work (calling the fortran interface via ilp64), they will cause segfaults.

additionally, as stated above, LAPACKE_dlange is missing from the cblas interface (as is dlansy, and i suspect a few others as well). if you guys could patch and update these functions and toss over the patched dist, that'd be great.

thx

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington
Portrait de Gagan

hi, another update.

there is an issue with csrcoo in ilp64 mode. i have attached the test.c that you can use with the matrix files above to reproduce this error. note this only happens in ilp64 mode. i haven't encountered this error otherwise and have spent the day trying to fix it. here is the output just to give you an idea of what the problem is:

Intel MKL ERROR: Parameter 1 was incorrect on entry to MKL_DCSRCOO.

Intel MKL ERROR: Parameter 1 was incorrect on entry to MKL_DCSRCOO.

note that in this testcase i'm trying to convert the CSR to coordinate, whereas in my actual program i'm converting coordinate to csr. thus the issue is independent of whether i'm going from CSR->COO or COO->CSR. i am hoping this is just a minor bug high up in the function.

Fichiers joints: 

Fichier attachéTaille
Télécharger test.c2.32 Ko
"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington

these symtomps indicate that you forget compile this example with ILP64 libraries w/o /DMKL_ILP64 option.

Portrait de Gagan

hi,

per your recommendation, i added the -DMKL_ILP64 flag but this causes the mallocs to fail, instead of csrcoo. 

it seems that using the -DMKL_ILP64 causes an issue with memory allocation but allows the csrcoo function to look fine. i say this because malloc doesn't report an error when -DMKL_ILP64 is removed (but i am still linking with ilp64). here is the malloc error using -DMKL_ILP64

testCase(59103,0x7fff79900310) malloc: *** mach_vm_map(size=1125865622122496) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

testCase(59103,0x7fff79900310) malloc: *** mach_vm_map(size=1125865622122496) failed (error code=3)

*** error: can't allocate region

*** set a breakpoint in malloc_error_break to debug

where the compilation line was:

icpc-g -stdlib=libc++ -std=c++11  -O3 -DMKL_ILP64  -fimf-arch-consistency=true -vec-guard-write -no-ftz -ansi-alias -fPIC -funroll-all-loops -ipo -mtune=native -o testCase test.c -L/usr/lib -L/usr/local/lib -L/opt/intel/composerxe/mkl/lib -lmkl_intel_ilp64 -lmkl_core -lpthread -lznz -lm -openmp -lmkl_intel_thread -lz 

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington

Hi Gagan,

I found several bugs in your test.c example. You can find all changes in attached test_SPBLAS.cpp file

The compilation line for intel C++ 14.0 compiler:

icpc -std=c++0x -DMKL_ILP64 -openmp -I${MKL_ROOT}/include test_SPBLAS.cpp -L${MKL_ROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_intel_thread -lpthread -lm -o test.exe

where MKL_ROOT is the path to __release_lnx/mkl directory.

About your example with EE solver. I am investigating  it on machine with 65 Gb of available RAM and seeing fail of EE solver. Now i am looking for the root cause. Could you tell me about your size of available RAM?  

W.B.R.

Vitaly

 

Fichiers joints: 

Fichier attachéTaille
Télécharger test-spblas.cpp2.78 Ko
Portrait de Gagan

hey man,

tthanks for this code i will look at it asap.

regarding the failing of the eigensolver-- i am now observing this instead of the failed mallocs that i pasted above. however i think they are related, because the eigensolver is failing due to the silent, but failed, allocation of the output array.

  • i pinpointed it to the output array by printing out its indices, and noticed that i'd hit a segfault well-before the index variable was near the end. if i didn't have this test code in there, it'd hit the feast subroutine and then give an exit code (error 201). is this the error you were experiencing as well?

hope my experience helps. and thanks for this improved code!

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington
Portrait de Gagan

oh and 128gb of main memory. i think i have used up to 108 without issue. so let's say that is the limit.

sorry about not answering the question in the first post :P

"Natural knowledge has not forgone emotion. It has simply taken for itself new ground of emotion, under impulsion from and in sacrifice to that one of its 'values', Truth." - Sir Charles Sherrington

Connectez-vous pour laisser un commentaire.