Intel® Math Kernel Library

what is ARR_HDR_SIZE in IPP samples

Hello All,

I am implementing the JPEG encoding and everything looks fine. But, I want to know the ARR_HDR_SIZE  in uic samples 


#include <stdlib.h>
#include <etxt.h>
#include "uic_new.h"

using namespace UIC;

static const unsigned int ARR_HDR_SIZE = 32;

void* UIC::ArrAlloc   (Ipp32u itemSize, Ipp32u nOfItems)
    void *buff = malloc(itemSize * nOfItems + ARR_HDR_SIZE);
    unsigned int *countOf = (unsigned int*)buff;

    *countOf = nOfItems;

Reduced Row Echelon Form of a matrix (rref)


I'm newbie to the Intel  MKL library and I'm trying to convert a code from Matlab to C using the C interface of Intel MKL routines.

I haven't found a function to achieve the RREF of a matrix in Intel MKL. In fact, the RREF in Matlab performs gaussian elimination with partial pivoting and I want to apply it to a 5x9 matrix.

Here is an example of RREF applied to a 5x9 matrix A

Compilation error


I am trying to compile the C programme which uses the MKL subroutine.

For example, mkl_scsrdia and mkl_scsrcoo,

but when I am trying to compile it,

I am getting error like this. It seems that I am not linking some path

or flags during the compilation.

icc one.c -L$MKLPATH -I$MKLINCLUDE -lmkl -liomp5 -lpthread

one.c -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lpthread -lsplas -lm -lmkl -lguide

nothing wordks, got error like this

CGEMM performance strangeness on Haswell CPUs vs. Sandy Bridge

Hi All

I have investigated the performance of the CGEMM algorithm using both my own sandy bridge CPU and my colleagues newer computer with a Haswell cpu. The calculation is measured as the number of complex multiply accumulate operations per second it can perform, here denoted as CMacs. I don't use any scaling and I don't add the the previous matrix, so I only calculate C = A * B.

The setup:

The number of Rows in A = 2^16

Number of columns in A = 16

Number of columns in B = 256

GCMacs = A_r * A_c * B_c / time * 1e-9

sparse pblas?

Hello everyone,

I am trying to use MKL to solve sparse matrix multiplication (i.e. C=A*B) in parallel. I found there is sparse blas for sequential code but I am wondering if there is a parallel version of it.




mkl_peak_mem_usage does not reset ?


i am using mkl_peak_mem_usage to measure mkl memory usage and despite i use mkl_peak_mem_usage(MKL_PEAK_MEM_RESET) , the counters are not zero. For example : 

mkl_peak_mem_usage(MKL_PEAK_MEM_RESET) ; // reset peak

printf( "memory peak : %lld\n", mkl_peak_mem_usage(MKL_PEAK_MEM_RESET)  ); // print peak and reset again

// do some calculations

printf( "memory peak : %lld\n", mkl_peak_mem_usage(MKL_PEAK_MEM_RESET)  ); // print new peak and reset

MKL_Complex16 arrays for mathematical operations problems

Hello everyone

I am using icc compiler to deal with my mkl_complex16 arrays, i was trying out my mathematical formula that i want to implement on the mic... but i got these errors:

essai.c(136): error: expression must have struct or union type
                sum.real= sum.real+w.real*y_in.real[z+1][s+1];

essai.c(137): error: expression must have struct or union type
                sum.imag= sum.imag+w.imag*y_in.imag[z+1][s+1];

Here is my code:

2D convolution using VSL Conv

Hi all,

I'm wondering if anyone can help. I'm trying to calculate a 2d convolution of two square arrays f(x,y) and g(x,y) using the code below, which compiles and runs fine (no error messages), but does not give the correct output for h(x,y). Instead, all values of h[n] are unchanged (e.g. if initialized before attempting the convolution) except one "column" of the output, at x=0.

I've scoured the examples included with MKL libraries, but can't spot anything different between my code and the minimal working examples of this type.






I'm trying to solve a SPD matrix with CPARDISO but it fails with the following error:

*** Error in PARDISO  (     insufficient_memory) error_num= 10
*** Error in PARDISO memory allocation: SOLVING_ITERREF_WORK_DATA, allocation of 1 bytes failed

The matrix is of size 90 x 90, and I have up to 4 GB of free RAM before runing the job on 2 MPI processes.

I'm attaching the program, can you reproduce this behavior on your side ?

Thank you for your help.

订阅 Intel® Math Kernel Library