How to use Parallel MKL from linux?

How to use Parallel MKL from linux?

Folks,

Very new to parallel programming so need your help. I'm trying to solve a very large symmetric sparse general eigenvalue problem using extended eigen solver. I have no problem to do a smaller scale problem by one thread using "dfeat_scsrgv" subroutine. However, I have no clue on how to increase the speed by utilizing the parallel capability.

My system: Linux intel 64
Software: Intel ComposerXE 2013, mpich compiled by XE 2013

Here is what I did:

1. Compile: mpif90 -mkl=parallel -o test_mpi.x test_sparse_solver.f90
2. Run: mpiexec -np 8 ./test_mpi.x

The running itself was ok but my concern is if I really used the parallel capability. For a smaller problem with 2000 equations, using "-np 8" took longer time than "-np 1". I realized I might need to change the source code, but have no clue on where to start. Could you give me some quick reference to get it run parallelly? very much appriciate and thanks in advance.

Letian

Here is my source code: (FORTRAN 90)

!this routine test MKL sparse eigen solver
implicit real*8 (a-h,o-z)
real*8,allocatable::a(:),b(:)
integer,allocatable::cola(:),rowa(:),colb(:),rowb(:)
real*8,allocatable::e(:), x(:,:)
integer fpm(128)
real time_begin, time_end

m0=50
emin=0.0
emax=2e7
fpm=0

open(98,file='ifort98.dat',form='unformatted')
read(98) n, na
allocate (a(na),cola(na),rowa(n+1))
read(98) (a(i),i=1,na)
read(98) (cola(i),i=1,na)
read(98) (rowa(i),i=1,n+1)
read(98) n, nb
allocate (b(nb),colb(nb),rowb(n+1))
read(98) (b(i),i=1,nb)
read(98) (colb(i),i=1,nb)
read(98) (rowb(i),i=1,n+1)
close(98)

call CPU_time(time_begin)

allocate (e(m0), x(n,m0))

call feastinit(fpm)
print*,fpm
call dfeast_scsrgv('U',n,a,rowa,cola,b,rowb,colb,fpm,epsout,loop,emin,emax,m0,e,x,m,res,info)

print*,'info=',info
print*,'m=',m
print*,'loop=',loop
print*,'epsout=',epsout

open(10,file='test.out')
do i=1,m
write(10,*) 'mode',i,' Freq=', sqrt(e(i))*0.5/3.1415926535897932
enddo
close(10)

deallocate (a,b,cola,rowa,colb,rowb,e,x)

call cpu_time(time_end)

print*,'Total CPU time=', time_end-time_begin
stop
end

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Hi Letian,

It looks a  big question.  I may suggest you to  start with MKL internal parallel. 

As for most of case, MKL have explored the best parallel performance on multi-core based on your system configuration and problem size.  If you call threaded MKL library, your application will get parallel automatically.

For example, you may try  the pardiso  first to see the performance change with export MKL_NUM_THREADS=1/2/4/8,  also  with command

> ifort -mkl your.f90

>export MKL_NUM_THREADS=1

>a.out

( I'm not sure how mpi process influence the MKL thread ,which is based on OpenMP)

Then if you really need parallelize your application yourself, you may need to learn all kind parallel method,  typically, OpenMP as

http://software.intel.com/en-us/forums/topic/487697

and pThread on Linux. 

+ threaded MKL library (-lmkl_intel_thread -lmkl_core -liomp5) . 

You may search in the forum or mkl userguide. Here is one documentation about this for your reference.

http://software.intel.com/en-us/articles/intel-math-kernel-library-intel-mkl-using-intel-mkl-with-threaded-applications

Best Regards,

Ying

People who are interested in cpu_time for parallel benchmarks usually consider an increase as a favorable result, using it along with the elapsed time (e.g. from system_clock) to calculate "concurrency" (the ratio of cpu time to elapsed time).

The new compiler feature !$omp parallel do simd is particularly hoggish in terms of making big increases in CPU time, on the assumption that enough threads will be used to make a reduction in elapsed time.

Hyperthreading enthusiasts don't always care even about a reduction in elapsed time; they simply like to see a large concurrency figure.

Leave a Comment

Please sign in to add a comment. Not a member? Join today