I have been operating an open source chess program tournament for a long time, and I have noticed that some contestants compiled with the Intel compiler crash and result in the engine losing the game as well as spawning an error message in Windows

I have Visual Studio and I have tried new project and loaded one of the problem engines and after cleaning up some defects related to the C++ ISO standard I was able to make a stable x64 build

now the program works fine, so this tells me the Intel compiler needs some work on it to eliminate the crashing



I'm getting symbol _cilk_spawn could not be resolved when compiling with icpc


Im using Ubuntu 14.04 & eclipse & intel compiler v 15

I have 2 same cilk programs (one with as c program and the other as cpp program).

I can compile it with icc (without any problem) 

But when I'm using icpc (cpp program) I'm getting errors: symbol _cilk_spawn could not be resolved 

In those same 2 programs Im not using any flags.

What is the different with cpp program, which I cant compile

Vtune Number of Cores Ambiguity

Hi, While running Vtune Amplifier XE 2015, I am encountering a situation which I am not able to understand. I have a program which I am running on my i5-4300U, 64bit Operating System and the program is completely unparallelized and hence according to my understanding should run on a single core. Now when I am running the Basic Hotspot Analysis for the program, I can see that only thread is being spawned but when I run Advanced Hotspot Analysis, and go to Bottum Up and choose "Core / H/W / Function / Call Stack" as my grouping, it shows my 2 cores are being used.

Debug symbols for libraries


I'm trying to add the debugging symbols for glibc to the hotspot analysis. I have installed the debug-info rpms under /usr/lib/debug/lib64 and am using:

amplxe-cl -collect hotspots -search-dir sym:rp=/usr/lib/debug/lib64 <application> <params> 

to perform the analysis. However while I have under /usr/lib/debug/lib, VTune still gives me a warning: 

Possible dgetrf IPIV issue

Hello, I am attempting to use dgetrf to get an LU factorization of a square matrix as part of a large mex program. When I check the output of dgetrf, I find the IPIV contains both a 0 and a number which is size of the matrix. I checked the documentation and it says the zero should not be there.

I have been able to reproduce this error in a smaller test case:

The C script (test_case.c)

Question: cycle count of 65536 MKL FFT DftiComputeForward(C++)

My code as followings:


fft_mkl(int M,float * InputData,float * OutputData)


MKL_LONG status;

DFTI_DESCRIPTOR my_desc1_handle;
DftiCreateDescriptor( &my_desc1_handle, DFTI_SINGLE,DFTI_COMPLEX, 1, M);
DftiSetValue( my_desc1_handle, DFTI_PLACEMENT, DFTI_NOT_INPLACE);
DftiCommitDescriptor( my_desc1_handle );
status = DftiComputeForward( my_desc1_handle, InputData, OutputData);
status = DftiFreeDescriptor(&my_desc1_handle);



float *test = new float [65536*2];

Incomplete factorization...


I am testing MKL ilu factorization. As it only works for ilu0, I provide a sparse matrix pattern that corresponds to ilu(k). One thing I observe is that the performance is getting slower when I increase the level of fills (I have compared with matlab for comparison). Is this expected ?

For instance, I have the following results from mkl.

level0, nnz = 9,027,150, time = 0.598093, matlab = 0.766352 

level1, nnz = 12,816,050, time = 1.3787, matlab = 1.31876

level2, nnz = 15,450,825, time = 2.57172, matlab = 1.9329

Vector plus/minus one floating point number


v?Add(n,a,b) performs element by element addition of vector a and vector b. Sometimes a single shift is required only, i.e. in this case b can be interpreted as a floating point number, i.e. a[k] + b. One example is the calculation of centered values (with respect to the empirical distribution). Does MKL provides such a function as well? Off cause I can create a vector b of the same size as the input vector a and set b[k] =b for each k, but I would like to avoid the memory allocation.

Best wishes

Markus Wendt

Xeon Phi crashes on too-large SCIF memory registration

Is there a mechanism with SCIF to register a memory region with all endpoints? At the moment, I have a for-loop with scif_register() on this memory region with each endpoint. Memory registration is rather expensive and I would like to avoid unnecessarily incurring this cost repeatedly if there is possibly a faster way to register with all endpoints.

With my current method, if the memory region is sufficiently large (e.g., 6 GB+), the coprocessor crashes during scif_register():

Assine o Thread