Lapack to Spike

Lapack to Spike

sirenum's picture

Hi


I'm trying to parallelize some lapack based code by replacing the dgbsv call with spike. The matrix is 10000x107 with kl=ku=53 and one rhs. I set mat%format = 'D' ! dense banded matrix mat%astru = 'G' ! general non symmetric mat%diagdo = 'N' ! no diagonal dominance and mat%A => ab(kl+1:3*kl+1,:) to skip the pivoting indices. For some reason Spike fails to solve the system (summary below) while lapack does it in no time. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> SPIKE SUMMARY >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Spike Strategy RP3 TIME FOR PARTITIONING 1.879501342773438e-02 TIME FOR SPIKE BANDED FACT 4.340887069702148e-02 TIME FOR SPIKE BANDED SOLV 5.887868404388428e-01 TIME FOR SPIKE (FACT+SOLV) 6.321957111358643e-01 RESIDUAL 2.132312888050483e+18 # Outside iterations: 51 SPIKE has failed (to reach the accuracy pspike%eps_out) >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> I've also tried autoadapt=true btw. which chooses EA3 but gives me "zero pivots" in addition to the above. I must say that I'm fairly new to fortran and that I don't understand the math behind this code. (But I do have some experience withparallel programming, so the MPI part should be alright.) I'm really running out of ideas here. If anybody has a hint of what might be wrong or where I should start looking, I'd appreciate it. lukas
11 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.
Henry Gabb (Intel)'s picture

Hello Lukas,
It looks like you might besetting the mat%A dimensions incorrectly. According to the Spike user guide, the first dimension of mat%A is the matrix bandwidth and the second dimension is the matrix size. Please try the following for your matrix:

mat%A => ab(kl+ku+1,mat%n)

or

mat%A => ab(107,10000)

Best regards,
Henry

Henry Gabb (Intel)'s picture

Hello again Lukas,
On second thought, your mat%A could be correct. I didn't notice that you're just skipping the first kl rows. Please try the following when you set the pointer:

mat%A => ab(kl+1:3*kl+1,1)

Maybe you need to explicitly set the starting point of the second dimension.

If that doesn't work, I'll need more information to diagnose the problem. Can you post some more code? Specifically, I'd like to see how you're declaring the mat structure. How many MPI processes are you using to solve this problem?

Henry

hydrotexx's picture

hello Henry
i'm also working on the MPI of the code that Lukas was using. Now i got to the point that Spike runs but only if i set one thread

pspike%autoadapt = .false.

pspike%threads = 1

but so the solver uses just LAPACK ... and is not parallel. If I leave the .true. option then the code runs parallel ( at least looks like .. .solving ist faster and swiches solving method from TA3 (with one thread) to RP3( no threads limit) ) ... but is not converging anymore. how is possible that with one solver the matrix is solved and in the other not .. isn't spike choosing the right solver ?

The code will more then 256 core awailable .. depends if needed.

Teodoro

Henry Gabb (Intel)'s picture

Hello Teodoro,
The pspike%threads variable controls the number of OpenMP threads. For now, leave it at one thread per MPI process. How many MPI processes are you using? This really determines the degree of parallelism. Also,is pspike%tp set to 0 or 1? Table 2.4 in the user guide shows which algorithms are allowed depending on pspike%nbprocs and pspike%tp.

When pspike%autoadapt=.true., the SPIKE adaptive layer tries to pick the optimum solver but it is not trained for small systems. How large is your matrix?

Best regards,
Henry

hydrotexx's picture

i use for the moment to make my tests 16 mpi cores. I never tried to change

pspike%tp values ( i just use .false. to vary the threads for spike) . if i leave one thread per mpi process ... it just uses LAPACK libraries .. without any improvement of speed i think, right?

I'll try to try the 2 values forpspike%tp and see if it will converge then.

my matrix at the moment has kl=ku= 53 n=10000 -> mat%A has shape 107 x 10000

best regards

Teodoro

hydrotexx's picture

i tryied to use the 2 different values forpspike%tp .... if i set it to 0 , the code seems works and uses the strategy RP3 (i'll start a longer test and see that he doesn't get any memory problems) , if i set it to 1 ... code crushes with an sigsev error .....

Henry Gabb (Intel)'s picture

Setting pspike%tp=0means that the master process (i.e., MPI rank 0) has the entire linear system. SPIKEwillpartition the system among the other MPI processes. When pspike%tp=1, SPIKE assumes that the linear system is already partitioned among the MPI processes. If this is not the case, SPIKE will fail.

hydrotexx's picture

looks like the code converges with pspike%tp= 0 but don't think is really increasing speed , compared to single thread LAPACK solver ... could it be that he still switches back to LAPACK ?

Henry Gabb (Intel)'s picture

It's possible that you're not getting a speedup because the matrix is so small. Please try solving the system with 2, 4, and 8 MPI processes and let me know if the performance changes.

SPIKE is a distributed-memory, parallel solver designed to solve large linear systems. In that sense, it is analogous to ScaLAPACK rather than LAPACK.

Thanks,
Henry

hydrotexx's picture

hello henry could u tell me why if i put spike.adapt=.true. it gives me this error UNSUCCESSFUL RUN FOR SPIKE - INFO EXIT -1 SPIKE_CORE ERROR CODE 0 ##### ERROR IN SPIKE SOLN, PLASTI info = -1

UNSUCCESSFUL RUN FOR SPIKE - INFO EXIT -1SPIKE_CORE ERROR CODE 0##### ERROR IN SPIKE SOLN, PLASTIinfo = -1 but if i put .false. it goes ? and if i put false it goes faster then the non-SPIKE version only if i use 128 processors. the matrix has ku and kl =204 and n=200'000 dimensions time to fact & solve = 8.9e-01 s best regards teodoro

Login to leave a comment.