Intel® AVX and CPU Instructions

64-bit bug in Visual C++? mov R8d,imm not completley defined

The Intel  documentation does not specify wether  mov R8d , -1  will also zero the high dword of R8, or leave it intact.

 The Microsoft Visual C++  (2010)  translate the C line  a = myfunc(par1, par2, 3) ; into

         mov RCX, par1 ; mov RDX, par2 ;   mov R8b, 3 ;    call myfunc;    move qword ptr [a], RAX

IF the behaviour is implementation-dependent, some processors may crash....

IF the high dword is set to zero when moving to the low dword, why not say it clearly?

Adding consecutive large numbers

I am trying to write a simple assembly code in asm using the AVX instructions. I have seen a problem rising up while adding large numbers. The code is here:

__asm__ __volatile__(
"vzeroall\n\t"
"movl $0, %%r9d\n\t"
"movl $4, %%r10d\n\t"
"leal (%%eax, %%r9d, 1), %%edx\n\t"
"vbroadcastss (%%edx), %%ymm0\n\t"
"leal (%%eax, %%r10d, 1), %%edx\n\t"
"vmovups (%%edx), %%ymm1\n\t"
"vaddps %%ymm0, %%ymm1, %%ymm2\n\t"
"vmovups %%ymm2, (%%edx)"
: "=a"(x) : "a"(x));

SDE failure trying to look at "chrome"

Hi,

    I tried the latest version of SDE, running in Ubuntu 12.04 with 3.2.0.40 kernel.  I get the following response:

>sde -- google-chrome

Failed to move to new PID namespace: Operation not permitted

Any ideas as to how I can run chrome with sde.  I was successful in running firefox btw.  Thanks for any help.. 

perfwise

Prefetch instructions

I'll be interested to have information about the behavior of prefetch hints instructions such as prefetcht0,prefetchnta,prefetchw,... for modern processors such as Sandy Bridge and Ivy Bridge. I ask because there is nothing about it in the optimization guide [1] apparently. It will be arguably a good thing for developers to know to which cache level data are prefetched with the diverse variants. I'll glad if someone provide a pointer to some detailed explanation.

[1] Intel® 64 and IA-32 Architectures Optimization Reference Manual, Order Number: 248966-026, April 2012

 

intel phi bandwith

Hi All,

      I am writing an application on MIC architecture,  I want to know the bandwith between each memory device.

Like bandwidth between core and L1, L1 and L2, L2 and memory.  I want these information to evaluate my application.

So I want to know how many Load can be issued each clock cycle. ? 

How many cycles needed to translate a 64byte cache line from L2 to L1 ?

Performing a two element broadcast/load in AVX

I have the following problem:

Say, at location 'A', I have: c1 d1 c3 d3, which are all doubles (64-bit). I want to fill two registers, a00 and a01 with:

a00-> |d1|c1|d1|c1| ; a01-> |d3|c3|d3|c3|

i.e I want to broadcast the first two elements to register a00 and the next two elements to register a01.

Currently, I'm doing it as follows:

Purpose of CPUID Deterministic Cache Parameters Leaf

Hello, 

Can someone explain the purpose of having two separate cache leaves (leaves 2 and 4) for the cpuid instruction? I ask because on my Intel Xeon 5650 system, the data from leaf 2 does not include any info for the L1 data cache. Is it standard to put this in the info from leaf 4? Please advise. 

Problem in compiling SSSE3 in Ubuntu12.04LTS

Hi

            I have a code  in which i used the SSSE3 inrtuctions, i wanted to compile in Ubuntu12.04 , the same code is compiling in Windows.My system  is Intel Xeon which supports SSSE3.(Got the info from the following command "more /proc/cpuinfo" .

While compiling in Ubuntu i added compiler option  "-mssse3 ", but still it is giving Error ."_mm_shuffle_epi8 was not declared in scope" , other instructions of SSSE3 are also not recognized.

Please let me know what might be the reason.Please reply.

Thanks in Advance.

Regards,

Harikrishna.K

页面

订阅 Intel® AVX and CPU Instructions