Mac install location conflicts with code signing

tbb dylibs on the Mac are built with an install name path (otool -D) of "libtbb.dylib" (and similar names for all the other tbb libraries), which means that if you link with them as-is and place it inside an app package in the Apple-recommended location, they won't be found and you'll die on launch with

dyld: Library not loaded: libtbb_debug.dylib

  Referenced from: /Users/williams/photoshop/main/photoshop/Targets/Debug_x86_64/Adobe Photoshop CC Photoshop CC 2015

  Reason: image not found

simple vector addition



I have a question w.r.t below scenario on Intel Sandy Bridge,

For simple vector addition code in C,

If I do dynamic memory allocation it vectorizes the main addition loop

    C[i] = A[i] + B[i]

even if I do not use any restrict keyword (icc 13)

But if I statically allocate arrays, it does not vectorize loop nor it specifies something about it in vectorization report.

Even if I do allocation using declspec(align) it does not vectorize.

What can be the causes ?

Thanks in advance,



Sparse Matrix mkl_?csrmultd problem


I want to use the mkl_?csrmultd to do 2 matrix product and the output is a dense matrix.

But i am confused when i read the manual, The ldc (leading dimension of dense matrix C) is a output  parameters( not a input parameter as usual ?), and the length of ib is m+1. The definition of ia is also very different with other sparse matrix routines, because the length is not m+1. 

Is there any problem about this part?

Any help and comment will be appreciated.




Vectorization of a function call


I have a following problem: I have a big loop inside my program that I want to parallelize and vectorize. Inside the loop I do a lot of math computations, but there are no dependencies between the iterations. However, inside the loop I call a simple function that returns minimum of two values, or zero, if the minimum value is negative. Generally, it looks like that:

Memory leak?


I have a C++ application which is coded this way:

  • The main program does not need much memory (just a few variables). But this main program runs a loop in which we call a function.
  • This function needs about 140 MB of memory to run. The memory is allocated in the function and then released (using RAII).

When I run this program overnight on OSX, here is the data I get from "Activity Monitor", or "top" in terms of memory consumption

C examples in MKL's ScaLAPACK

I am aware of these examples, but I do not know any Fortran, thus I can not understand much. So finding at least an example with MKL and ScaLAPACK in C would be critical for me. I know there is a C interface. For example p?potrf will be the function I am going to use, for performing a Cholesky factorization.

If someone would reply negatively, that would be also OK, since it would stop me for searching..

Windows: CUDA 6.5 and Intel Compiler (2015)?

I've been trying to find an authoritative answer for this, but everything is a few years old.  Can the Intel compiler be used with CUDA 6.5 or 7.0 on Windows / Visual Studio 2013?

I did manage to get it to work for a few hours, but then Visual Studio crashed hard and had to be repaired, and that broke it. 

Is there support for using CUDA 6.5 or 7.0 with the Intel Compiler (2015)?

mkl_intel_lp64 vs mkl_gf_lp64 and MKL advisor

I wonder if the Advisor get it wrong when suggests:

 -Wl,--no-as-needed -L${MKLROOT}/lib/intel64 -lmkl_intel_lp64 -lmkl_core -lmkl_gnu_thread -ldl -lpthread -lm

this bit: -lmkl_intel_lp64

when GNU's bits are chosen when applicable instead of Intel's

reason I'm asking is because I get undefined references to GOMP.

many thanks.

Threading abonnieren