Intel ISA Extensions

SDE failure trying to look at "chrome"


    I tried the latest version of SDE, running in Ubuntu 12.04 with kernel.  I get the following response:

>sde -- google-chrome

Failed to move to new PID namespace: Operation not permitted

Any ideas as to how I can run chrome with sde.  I was successful in running firefox btw.  Thanks for any help.. 


Prefetch instructions

I'll be interested to have information about the behavior of prefetch hints instructions such as prefetcht0,prefetchnta,prefetchw,... for modern processors such as Sandy Bridge and Ivy Bridge. I ask because there is nothing about it in the optimization guide [1] apparently. It will be arguably a good thing for developers to know to which cache level data are prefetched with the diverse variants. I'll glad if someone provide a pointer to some detailed explanation.

[1] Intel® 64 and IA-32 Architectures Optimization Reference Manual, Order Number: 248966-026, April 2012


intel phi bandwith

Hi All,

      I am writing an application on MIC architecture,  I want to know the bandwith between each memory device.

Like bandwidth between core and L1, L1 and L2, L2 and memory.  I want these information to evaluate my application.

So I want to know how many Load can be issued each clock cycle. ? 

How many cycles needed to translate a 64byte cache line from L2 to L1 ?

Performing a two element broadcast/load in AVX

I have the following problem:

Say, at location 'A', I have: c1 d1 c3 d3, which are all doubles (64-bit). I want to fill two registers, a00 and a01 with:

a00-> |d1|c1|d1|c1| ; a01-> |d3|c3|d3|c3|

i.e I want to broadcast the first two elements to register a00 and the next two elements to register a01.

Currently, I'm doing it as follows:

Purpose of CPUID Deterministic Cache Parameters Leaf


Can someone explain the purpose of having two separate cache leaves (leaves 2 and 4) for the cpuid instruction? I ask because on my Intel Xeon 5650 system, the data from leaf 2 does not include any info for the L1 data cache. Is it standard to put this in the info from leaf 4? Please advise. 

Array of _m128d values as function argument


I have a large loop which executes the following piece of code many times: 


//input: _m128d U_c00, U_c01, U_c02, U_c10, U_c11, U_c12, U_c20, U_c21, U_c22, psi_c0, psi_c1, psi_c2;

//output: _m128d chi_c0, chi_c1, chi_c2;

Problem in compiling SSSE3 in Ubuntu12.04LTS


            I have a code  in which i used the SSSE3 inrtuctions, i wanted to compile in Ubuntu12.04 , the same code is compiling in Windows.My system  is Intel Xeon which supports SSSE3.(Got the info from the following command "more /proc/cpuinfo" .

While compiling in Ubuntu i added compiler option  "-mssse3 ", but still it is giving Error ."_mm_shuffle_epi8 was not declared in scope" , other instructions of SSSE3 are also not recognized.

Please let me know what might be the reason.Please reply.

Thanks in Advance.



SSE sum of vectors - how to improve cache performance

Hello, the performance of my application heavily depends on summing two vectors (stored as aligned double arrays), namely I need a fast vecA += vecB. As with SSE one does not have instructions for  +=, the only option is to have vecA = vecA + vecB. I have two versions of this function:

inline void addToDoubleVectorSSE(const double * what, const double * toWhat, volatile double * dest, const unsigned int len)
   __m128d * _what = (__m128d*)what;
   __m128d * _toWhat = (__m128d*)toWhat;
   __m128d * _toWhatBase = (__m128d*)toWhat;

IA32_PERF_CTL on X64 error

I have a problem using wrmsr IA32_PERF_CTL,  in kernel space I get a STATUS_PRIVILEGED_INSTRUCTION exception and Windbg, which has a wrmsr function, reports "no such msr".  This is on an i5-2410M  CPU.

The same code and Windbg do not generate errors on another test platform.  What could be the cause of this? 

By the way rdmsr IA32_PERF_STS  works OK on both platforms.

S’abonner à Intel ISA Extensions