Intel® C++ Compiler

Free webinar April 7 2015 9am PST "Further Vectorization Features of the Intel Compiler"

There is a free webinar “Further Vectorization Features of the Intel Compiler” coming next Tuesday talking specifically about getting more vectorizations from Intel Compilers. But you would benefit it more if you've watched/listened to the previous webinar Performance essentials using OpenMP* 4.0 vectorization with C/C++.

OS X 10.9 Xcode 5 error can't open "stdlib.h" OR ld: library not found for -lcrt1.10.6.o

Environment:  OS X 10.9 (Mavericks) and Xcode 5.0

Command line compilations/links fail with either:

"catastrophic error: cannot open source file "stdlib.h" OR with "ld: library not found for -lcrt1.10.6.o"

Affected compiler:  Intel Composer XE 2013 Update 1

Root cause:  Xcode 5.0 installation does not install all command line tools (include files, libraries, SDKs) need for the compiler to link user applications. 

IDB use under Mac OS* X 10.7 Lion

In order to debug applications built by the Intel Composer XE 2011 (
Updates 6, 7, 8, and possible future updates ) under Mac OS* X 10.7
Lion, the following options are required:

-g -save-temps -fpic -Wl,-no_pie

Further information can be found here:

This does not affect users running Mac OS* X 10.6 Snow Leopard or Mac OS* X 10.5 Leopard.

Possible compiler bug

There's a possible bug in the icc  installed with Composer XE 2013 SP1 Update 5 (2013.1.5.239). The compiler compiiles the code but the result leads to a run-time crash.

Here is a c++ programs that can reproduce the crash:

If executed it leads to this error:

"Run-Time Check Failure #2 - Stack around the variable 'os_.1016' was corrupted."

Memory leak caused or worsened by /Qipo?

I've made a DLL while I compile with /Qipo (Intel C++ Composer XE2015). If I call the constructor and destructor of the main class in it, the memory doesn't get released and after a few calls (32 bit mode) I'm out of memory. However, if I disable /Qipo, there doesn't seem to be a problem at all (I will run it for a longer period tonight, but I let it construct and deconstruct 1024 times earlier tonight and I didn't notice an increase in memory usage).

If I use /Qip mode, the leak is 8 MB per call. With /Qipo it's about 300 MB.

Parallelization of dyadic product


I have two vectors (they can address the same vector) and I need to perform the product x[i]*y[j] with i,j=1..n.

What is the best way to perform this operation in parallel? I've tried


but I guess it is only a naive tentative to do that. Indeed vec-report says it is uneffcient.




_mm_unpackhi_epi8 and _mm_unpacklo_epi8 to convert 16 signed chars into 2 signed short vectors

I am using the _mm_unpacklo_epi16 and _mm_unpackhi_epi16 with second argumet vector of 0s to convert signed/unsigned short vectors into 2 signed/unsigned integer vectors. i.e.:

__m128i lowVec  = _mm_unpacklo_epi16(vecA vec0);
__m128i highVec = _mm_unpackhi_epi16(vecA,vec0);

This works fine with 16 unsigned chars vector into 2 unsigned short  vectors using  _mm_unpacklo_epi8 and _mm_unpackhi_epi8, yet when the input vector is of 16 signed chars the 2 short values in result vectors are all 127+original values. 

Intel® C++ Compiler abonnieren