Intel® C++ Compiler

Compiler bug in XE 2015: error : no instance of function template "..." matches the argument list


the following code:

#include <tuple>

struct Foo {
	std::tuple<int> inner;
	template <unsigned Idx>
	auto get() const -> decltype(std::get<Idx>(inner)) { return std::get<Idx>(inner); }

int main()
	Foo f;

produces the following error:

CPU2006 compile issues with MSVC 2013, ICC XE 2015 rev 3, windows server 2012


ICL Version


I have compile errors with 2 cpu2006 benchmarks.


483.xalancbmk dies in execution if I compile with -O3 -ipo; it works fine with -O2

453.povray says:

file defaultrenderfrontend.cpp

error "<mathimf.h> is incompatible with system <math.h>!"


proc_bind(spread) does not seem to be honored

Hello Folks,

I have a program that is decomposed in two parts:
One loop that allocates data: it does 4 iterations, one for each socket
One loop that does computation on the data, it does 48 iterations (each thread should work on a slice of data, hopefully a slice of data that is on the local socket).

My machine is a 4 socket, 12 cores per processor Xeon machine. I'm using ICC 15.0.1 20141023

Possible compiler bug

There's a possible bug in the icc  installed with Composer XE 2013 SP1 Update 5 (2013.1.5.239). The compiler compiiles the code but the result leads to a run-time crash.

Here is a c++ programs that can reproduce the crash:

If executed it leads to this error:

"Run-Time Check Failure #2 - Stack around the variable 'os_.1016' was corrupted."

Memory leak caused or worsened by /Qipo?

I've made a DLL while I compile with /Qipo (Intel C++ Composer XE2015). If I call the constructor and destructor of the main class in it, the memory doesn't get released and after a few calls (32 bit mode) I'm out of memory. However, if I disable /Qipo, there doesn't seem to be a problem at all (I will run it for a longer period tonight, but I let it construct and deconstruct 1024 times earlier tonight and I didn't notice an increase in memory usage).

If I use /Qip mode, the leak is 8 MB per call. With /Qipo it's about 300 MB.

Parallelization of dyadic product


I have two vectors (they can address the same vector) and I need to perform the product x[i]*y[j] with i,j=1..n.

What is the best way to perform this operation in parallel? I've tried


but I guess it is only a naive tentative to do that. Indeed vec-report says it is uneffcient.




_mm_unpackhi_epi8 and _mm_unpacklo_epi8 to convert 16 signed chars into 2 signed short vectors

I am using the _mm_unpacklo_epi16 and _mm_unpackhi_epi16 with second argumet vector of 0s to convert signed/unsigned short vectors into 2 signed/unsigned integer vectors. i.e.:

__m128i lowVec  = _mm_unpacklo_epi16(vecA vec0);
__m128i highVec = _mm_unpackhi_epi16(vecA,vec0);

This works fine with 16 unsigned chars vector into 2 unsigned short  vectors using  _mm_unpacklo_epi8 and _mm_unpackhi_epi8, yet when the input vector is of 16 signed chars the 2 short values in result vectors are all 127+original values. 

simple vector addition



I have a question w.r.t below scenario on Intel Sandy Bridge,

For simple vector addition code in C,

If I do dynamic memory allocation it vectorizes the main addition loop

    C[i] = A[i] + B[i]

even if I do not use any restrict keyword (icc 13)

But if I statically allocate arrays, it does not vectorize loop nor it specifies something about it in vectorization report.

Even if I do allocation using declspec(align) it does not vectorize.

What can be the causes ?

Thanks in advance,



订阅 Intel® C++ Compiler