Poor MKL Dfti complex to complex performance


I'm new to MIC programming and trying to get a grip on how to do things with the beast. I stumbled accros very bad FFT performance (using a matrix size often used at our institution) for dfti complex to complex transforms. In the following. no OMP, KMP, MKL variables are set, except when stated. Setting the number of threads or specifying the placement does not change much for this comparison: The mic is much slower than the host!

Any hints how to improve the situation?



Naiive Hardware Configuration Question.


Yet another naiive question. If I establish 2 compute nodes in my sandbox am I generally better off with a mic and 2 gpgpu per node? I'm guessing the answer is, it depends... But assuming that the mics leverage the vector processing in the gpus then pci seems like less of a bottleneck than qdr. My googling isn't showing big boxes with Frankenstein nodes but in my empty head it seems like a good idea.


Thanks again Robert


Better than linear scaling


I'm currently wondering about the scaling of my application with the number or cores. Basically I'm getting a 70x speedup with 56 cores compared to a single core. However the whole plot (see attached image) still looks more or less like a line but with a slope of about 1.2. I'm runing this application with thread numbers that are multiples of 4 and KMP_AFFINITY set to compact. What could possibly explain my curve?

Compiling BOOST in OSX, undefined symbols for architecture error

Dear all,

I am trying to compile BOOST in OS X 10.10, using intel compilers 15.0.2

./ --with-toolset=intel-darwin cxxflags="-std=c++11" linkflags="-std=c++11"

    ./b2 -q \

             toolset=intel-darwin \

             cxxflags="-std=c++11 -stdlib=libc++" \

             linkflags="-std=c++11 -stdlib=libc++" \

             address-model={{address_model}} \

             -j 4 \

             --user-config=user-config.jam \


             link=shared \

Lack of Ivy Bridge support in current VTune

Slide 9 of

shows a Sandy Bridge/Ivy Bridge Analysis category of pre-configured profiles.  However, with the XE2015 Update 2, there are categories for Sandy Bridge and Haswell, but none (besides General) which work for Ivy Bridge (see attached).   Which version of VTune, if any, is recommended for Ivy Bridge Memory Access analysis on

CPU compatibility with Xenon Phi ?

I am having hard time finding information on CPU compatibility with Xenon Phi's, if there is any.

Will the Xeon Phi 31S1P and the MPSS 3.4 work with i7 4820k ? Is there any restriction on CPU ?

Also, the Read me file for Windows of the driver says that the currently supported OS are:

Microsoft Windows* 7 Enterprise SP1 (64-bit)

Microsoft Windows* 8/8.1 Enterprise (64-bit)

Microsoft Windows* Server 2008 R2 SP1 (64-bit)

Microsoft Windows* Server 2012 (64-bit)

Microsoft Windows* Server 2012 R2 (64-bit)


Subscribe to Threading