Intel® Math Kernel Library

ZYTRI sequential?

Hi
I am testing the following code with composer_xe_2015

lwork = 256*n
call ZSYTRF( 'U', n, afull, n, ipiv, work, lwork, error )
call ZSYTRI( 'U', n, afull, n, ipiv, work, lwork, error )

using

gfortran -O3 -fopenmp  read_blas.f90 -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_gf_lp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_gnu_thread.a -Wl,--end-group -ldl -lpthread -lm

After setting OMP_NUM_THREADS=16 I can see in "top" that  ZSYTRF runs in parallel but ZSYTRI not. The matrix is large enough (n=15120)

Exploit symmetry in Cholesky

  I am performing a Cholesky decomposition, like in this thread. Post #4 contains a minor example, that looks like as my code. The minor example of Ying is based on this example. As one can see, the matrix is read by the master node and then gets distributed (every element of the matrix). Then every element of the matrix is being gathered back to the master node.

mkl_?csrcoo duplicate COO values

Hi,

I am using an old sparse linear solver in my code (of HSL) that receives the matrix in COO form and it accepts to have duplicate values. For example, (1,2,50) and (1,2,30) are two different entries and are summed up internally.

I want to switch to MKL PARDISO to see about the potential speedup I can have from this. From what I get, PARDISO needs the sparse matrix to be in CSR format. Thus, I thought of using mkl_dcsrcoo with job(1)=1 to get my matrix. The problem is, I don't know if it takes care of the duplicates internally or not.

Thanks in advance!

[solved] no libmkl_tbb_thread.a after installation

 

Hello,

I just finished to install a 2016 beta version of Intel parallel studio XE on centOS 7 (parallel_studio_xe_2016.0.035). I used the link advisor website to obtain the arguments list for linking and compiling (I want to link statically my executable and I am using MKL and TBB). To link my executable, I used (as given by the link advisor):

 -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_ilp64.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_tbb_thread.a -Wl,--end-group -ltbb -lstdc++ -lpthread -lm

Indices zero based or one based?

Are indices in the Intel MKL BLAS and LAPACK function zero-based or one-based. For example the function idamax I think returns a zero-based argument with MKL CBLAS. However, my experience with Netlib was that this returned a one based index. There are also other functions such as dsyevx that require index arguments and I don't know whether these should be one-based or zero-based.

BLACS broadcast 64-bit integer

Hi,

I'm trying to broadcast a 64-bit integer with BLACS routines IGEBS2D() and IGEBR2D() --- Centos 6.5 Linux, ifort, composer_xe_2013_sp1.2.144, intel64, Intel MPI, ilp64 libs.  Despite declaring all integers as integer*8, compiling with -i8 and linking exclusively with ilp64 libs, only 32-bits of the 64-bit integer seem to be broadcast.  My compile line is:

  mpiifort -i8 -o demo1 demo1.f -warn all -L${MKLROOT}/lib/intel64 -lmkl_intel_ilp64 -lmkl_core -lmkl_sequential -lmkl_blacs_intelmpi_ilp64 -lpthread -lm

Sample program:

Should I place a barrier before calling pdpotri()?

I am using pdpotrf() in order to perform the Cholesky decomposition. Then I want to call pdpotri(), in order to invert the matrix. The function is called from every process, just after pdpotrf(). Should I put a barrier there, so that I am sure that all the processes are done with the Cholesky factorization, and then move on to the inversion part, or it's not needed?

Calling LAPACK/MKL from parallel OpenMP region

Dear All,

I often call some BLAS and LAPACK (MKL) routines from my Fortran programs. Typically, I try to place these calls outside of any parallel OpenMP regions while then making use of the parallelism inside the routines.

In a new part of the code, however, I need to call DGESV from a "omp parallel" region (see dummy code below). The below code will crash as all threads call GESV individually. Putting the GESV call into a "omp single" section works, but limits the performance, as it is running in one thread only.

S’abonner à Intel® Math Kernel Library