并行计算

Hardware EBS support on Ivy Bridge-E i7-4930K

Hi,

I'm trying to install "Intel C++ Studio XE 2013 SP1 Update 2 for Windows" on my workstation in order to run VTune etc. however in the pre-requisite issues screen in the installer it's reporting that "The processor in this system does not support hardware event-based sampling". Is this true?

I looked at the release notes before I bought my license to Studio XE today but did not see there was a restriction on the support for the recent generation of Intel CPUs.

Question about performance

I'm writing to see if someone could help me understand an issue in our solver that recently came up while using Vtune Amplifier. I'll try and describe this here:

 

Using vtune amplifier we see that the time spent in a function "mucal" goes up as number of threads increase. On 8 threads, mucal is at the top of the list.

 

mucal is a function that calculates viscosity. This is called in the following manner.

 

 

do ijk=1,iend

  mu(ijk)=mucal(ijk,iopt)

end do

 

Question about performance

I'm writing to see if someone could help me understand an issue in our solver that recently came up while using Vtune Amplifier. I'll try and describe this here:

 

Using vtune amplifier we see that the time spent in a function "mucal" goes up as number of threads increase. On 8 threads, mucal is at the top of the list.

 

mucal is a function that calculates viscosity. This is called in the following manner.

 

 

do ijk=1,iend

  mu(ijk)=mucal(ijk,iopt)

end do

 

CFD mesh First cell index: 1

Use macports gcc

I compile the following program: #include "array" int main() { return 0; } with g++ (from macports) like so: /opt/local/bin/g++-mp-4.8 -std=c++0x a.cpp How can compile the same program with icpc [version 14.0.2 (gcc version 4.2.1 compatibility)]? I tried several things I found via google but nothing seems to work. For example: icpc a.cpp -I /opt/local/include/gcc47/c++ -std=c++11 gives many compilation errors, icpc a.cpp -I /opt/local/include/gcc48/c++ -std=c++11 triggers #if __cplusplus < 201103L #error ...

Does ICC 14 generate BMI instructions?

Does anyone know if ICC 14 can transform (x >> 12) & 0x3     into _bextr_u32(x, 12, 2)    ?

I tried compiling it with icc -mcore-avx2   but it didn't transform.  How profitable is it to do so?    2 instructions, 2 cycles latency   vs  1 instruction, 2 cycles latency.

Also what is there an analogue of bextr_u32  for inserting contiguous bits into another word?  (e.g.      a | ((b & 0xff) << 8)  )

It seems that instruction would need 4 operands, which isn't implemented, but what about just filling all the upper bits (e.g.  a | (b << 8)  )

订阅 并行计算