Intel® Math Kernel Library

MKL's (distributed) FFT library fails with a floating-point error

When repeatedly calling MKL's distributed (cluster) DFT library  via the FFTW3 interface library, it will fail with a floating-point error with certain combination's of grid sizes and MPI processes (eg, a 1024 x 256 x 1 grid running with 17 MPI processes). This is repeatable, and I have uploaded an example code that demonstrates the problem. I am compiling using "Composer XE 2015" tools (MKL), eg

Modified Cholesky factorisation

Hello,

I'm using MKL to calculate Cholesky factorisation of a covariance matrix. MKL (?POTRF function) is of course much faster than my own naiive implementation (input: 6500x6500 matrix), however there is a problem. Our client requires a *modified* version of the algorithm (below - custom minimum conditions) and therefore MKL gives different results. After I remove the custom minimum conditions (<0.001) from my implementation, both algorithms give *perfectly* equal results. 

Is it possible to force MKL to respect these custom conditions somehow? Thanks for any help.

OpenMP not using all processors

I am trying to use MKL libraries and OpenMP in a MSVS C++ application on Windows7. The application shows affinity for all 24 processors (2 nodes, 6 processors, HyperThreaded). omp_get_num_procs() also shows 24 processors.  When I run the program only 1 node and 6 processors are accessed. This is confirmed  when I use "KMP_AFFINITY=verbose,none". It ouputs "OMP: Info #179: KMP_AFFINITY: 1 packages x 6 cores/pkg x 1 threads/core (6 total cores)".  I get no compiler or linker complaints.

OpenMP not using all processors

I am trying to use MKL libraries and OpenMP in a MSVS C++ application on Windows7. The application shows affinity for all 24 processors (2 nodes, 6 processors, HyperThreaded). omp_get_num_procs() also shows 24 processors.  When I run the program only 1 node and 6 processors are accessed. This is confirmed  when I use "KMP_AFFINITY=verbose,none". It ouputs "OMP: Info #179: KMP_AFFINITY: 1 packages x 6 cores/pkg x 1 threads/core (6 total cores)".  I get no compiler or linker complaints.

does MKL FFT use pre-calculated twiddle factor table or calculates the twiddle factor when doing FFT?

 Hi,

         I'm evaluating the MKL FFT performance, I have a quick question,  does MKL FFT use pre-calculated twiddle factor table or calculate the twiddle factor parallely when doing FFT?

 

Thank you

John

what is ARR_HDR_SIZE in IPP samples

Hello All,

I am implementing the JPEG encoding and everything looks fine. But, I want to know the ARR_HDR_SIZE  in uic samples 

 

#include <stdlib.h>
#include <etxt.h>
#include "uic_new.h"

using namespace UIC;

static const unsigned int ARR_HDR_SIZE = 32;

void* UIC::ArrAlloc   (Ipp32u itemSize, Ipp32u nOfItems)
{
    void *buff = malloc(itemSize * nOfItems + ARR_HDR_SIZE);
    unsigned int *countOf = (unsigned int*)buff;

    *countOf = nOfItems;

Reduced Row Echelon Form of a matrix (rref)

Hi,

I'm newbie to the Intel  MKL library and I'm trying to convert a code from Matlab to C using the C interface of Intel MKL routines.

I haven't found a function to achieve the RREF of a matrix in Intel MKL. In fact, the RREF in Matlab performs gaussian elimination with partial pivoting and I want to apply it to a 5x9 matrix.

Here is an example of RREF applied to a 5x9 matrix A

Compilation error

Hi

I am trying to compile the C programme which uses the MKL subroutine.

For example, mkl_scsrdia and mkl_scsrcoo,

but when I am trying to compile it,

I am getting error like this. It seems that I am not linking some path

or flags during the compilation.

icc one.c -L$MKLPATH -I$MKLINCLUDE -lmkl -liomp5 -lpthread

one.c -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lsplas -lm -lmkl -lguide

nothing wordks, got error like this

CGEMM performance strangeness on Haswell CPUs vs. Sandy Bridge

Hi All

I have investigated the performance of the CGEMM algorithm using both my own sandy bridge CPU and my colleagues newer computer with a Haswell cpu. The calculation is measured as the number of complex multiply accumulate operations per second it can perform, here denoted as CMacs. I don't use any scaling and I don't add the the previous matrix, so I only calculate C = A * B.

The setup:

The number of Rows in A = 2^16

Number of columns in A = 16

Number of columns in B = 256

GCMacs = A_r * A_c * B_c / time * 1e-9

Suscribirse a Intel® Math Kernel Library