Intel® Math Kernel Library

Numerical integration package in MKL


I have "old" codes that call Fortran numerical_libraries routine GQRUL and DGQRUL for calculating Gauss-Legendre quadrature rule to perform numerical integration.  I used to be able to just put a line in the main routine "USE numerical_libraries" and subsequently was able to call GQRUL and DGQRUL functions.  

Is it save to call ?feast_syev/?feast_heev from multiple threads?


I'm developing an application that needs to compute various eigenvalue decompositions. Is it possible to call zfeast_heev from multiple threads in parallel? Ofcourse, each thread has it's own memory. I could not find this kind of information in the documentation. Currently I'm using zhpevd, which works fine when called from multiple threads, zfeast_heev however, do not.

Looking forward to your answers


how do I know MKL FFT being called used AVX-512


        I am trying to look at MKL FFT performance by calling 5 lib functions as below, I configured the optimization and enabled AVX-512 etc in the project properties setting(vs2013 integrated with Intel-parallel-studio ).

        status = DftiCreateDescriptor(&DFT_desc, DFTI_SINGLE, DFTI_COMPLEX, 1, IDFTSize);

        status = DftiSetValue(DFT_desc, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
        status = DftiCommitDescriptor(DFT_desc);

        status = DftiComputeForward(DFT_desc, IDFT_in_singlePrecision, IDFT_out_singlePrecision);

What is the calling convetion for i_malloc and friends on 32bit Windows


I am trying to redefine memory allocation functions for MKL by using function pointer i_malloc and friends as described in On 32bit Windows however I am facing a problem: I could not figure out what is the calling convention for those functions? Is it __cdecl, __stdcall, __fastcall? The i_malloc.h header just says

MKL's (distributed) FFT library fails with a floating-point error

When repeatedly calling MKL's distributed (cluster) DFT library  via the FFTW3 interface library, it will fail with a floating-point error with certain combination's of grid sizes and MPI processes (eg, a 1024 x 256 x 1 grid running with 17 MPI processes). This is repeatable, and I have uploaded an example code that demonstrates the problem. I am compiling using "Composer XE 2015" tools (MKL), eg

Modified Cholesky factorisation


I'm using MKL to calculate Cholesky factorisation of a covariance matrix. MKL (?POTRF function) is of course much faster than my own naiive implementation (input: 6500x6500 matrix), however there is a problem. Our client requires a *modified* version of the algorithm (below - custom minimum conditions) and therefore MKL gives different results. After I remove the custom minimum conditions (<0.001) from my implementation, both algorithms give *perfectly* equal results. 

Is it possible to force MKL to respect these custom conditions somehow? Thanks for any help.

OpenMP not using all processors

I am trying to use MKL libraries and OpenMP in a MSVS C++ application on Windows7. The application shows affinity for all 24 processors (2 nodes, 6 processors, HyperThreaded). omp_get_num_procs() also shows 24 processors.  When I run the program only 1 node and 6 processors are accessed. This is confirmed  when I use "KMP_AFFINITY=verbose,none". It ouputs "OMP: Info #179: KMP_AFFINITY: 1 packages x 6 cores/pkg x 1 threads/core (6 total cores)".  I get no compiler or linker complaints.

Подписаться на Intel® Math Kernel Library