using icpc, what is the best way to optimize the following code with AVX option?

I have a for loop to identify min and max of a float array. Here is my implementation:

for (int i = 0; i < counter; i++) {

     imin = intervals[i] < imin ? intervals[i] : imin;

     imax = intervals[i] > imax ? intervals[i] : imax;


What is the best way to optimize the code here? I am using -xAVX option and I am running the program in single thread mode. How does this compare to using std::minmax_element? 

Should I write some manual AVX code?


Phi seems not fully support AVX512? Any way to do MATRIX transpose?

I found in past topics that mm512_unpacklo_* is not supported on phi. In my own implementation, it seems mm512_permute* and mm512_shuffle* is also not supported. So far all matrix transpose operation in past posts seems implemented by using mm512_swizzle* and mm512_blend* instructions. However, use these two operations requires two times more element movement, seems low efficiency. Is their any other choices to do matrix transpose?


Using class exported from third party VC++ DLL


is it safe to use a class exported from third party VC++ DLL?

I use Intel C++ with Visual Studio integration, so every C++ project is compiled with Intal C++ compiler.

I will use some 3rd party library, which comes without source code - only headers and builds for specific Microsoft VC++ compiler. This library contains classes which are exported with _declspec(dllexport). Is it safe to use those classes with Intel C++ compiler under Visual Studio? Of course the binary dll which I use was built with the same Version of VC++ which I use with Intel C++.

Can't see Media SDK filters in Graphedit


I'm a beginner with Directshow, but made my first application already. I'm able to capture and preview various cameras I use for the project I'm working on. However, all the capture content is RAW video, so huge in size. Therefore, I would like to use the Intel h264 encoder. I built my software in Labview, but first test my directshow setup in Graphedit. However, I don't see the Intel filters in Graphedit. I installed the media SDK, but that is how far I got. 

I hope somebody has the patience to walk me through the process or give me some pointers where to look.

Vectorization analysis not supported for Core i7 -4770 ?


I am running Intel® VTune™ Amplifier XE 2015 Update 4, (build 410668) on Windows 8.1 with a Core i7-4770 processor

When I check "Analyze vectorization" in a custom analysis I get the message "Vectorization analysis is not supported for this microprocessor. See release note for more détails".

I want to run this on a program compiled in VS2012 with Intel Fortran Version 15.0.0122.11. My user account on this PC has administrator rights.

I can't find any explanation for this in the VTune Help.


Running "Shark" machine learning library on Xeon Phi


I am using Shark (http://image.diku.dk/shark/sphinx_pages/build/html/index.html) for NN simulation. After installing it, I managed to compile and run a simple NN using icpc and openmp on 12 cores on my system. Now, I am trying to compile and run it on Xeon Phi. This is the command line that I used:

OS X 32bit + inline assembler = broken shared library

I've found a bug in OS X version of Intel C++ compiler when compiling shared 32bit library with inline msasm code in it. The problem is that compiler fails to produce either PIC code or text relocation for inline assembler referencing global or static variables which results in wrong memory address being referenced. Here's minimal test case (also attached for your convenience):


void f(void);


#include "header.h"
int main() {
	return 0;


Subscribe to Optimization