Intel® AVX and CPU Instructions

Quad precision ?

Hi,

I am a bit lost. I try to find information on how the "quad-precision" (REAL*16) is implemented on new Intel CPU (like Xeon 54xx) and Intel 10.x compiler...

It is hardware-supported or only software supported (or a mix of the two)?

What is the accuracy (in digit) we can expect?

What kind of performance we can expect in comparison to typical a double-precision (Linpack for example) ?

How the Intel CPU compare in quad precision with the IBM POWER6 architecture?

For example, a quote from the POWER6 description:

FMA now an extension of AVX?

The madd/sub instructionswere one of the new killer features I was waiting for, but now they are gone from the java intrinsics guide(still present in april's release)andrefered to as an extension to AVX in thepdf. Was this feature cut from the first iteration of avx cpus or will it just have a separate cpuid flag?

How many MMX/SSE units in Core-2 Quad

I have a powerful HP comuter with Q9550 (Core 2 Quad CPU). It seems that there is only one MMX/SSE unit shared between all 4 cores.


The reason I think so is the following. I am running a simple program that usses SSE-2.



  • Running 1 thread achieves 300MB/s.

  • Running 2 threads achieves 150MB/s per thread.

  • Running 4 threads achieves 75MB/s per thread.

My laptop with T7250 (Core 2 Duo CPU) exhibits the similar behavior.


Is it true that Core-2 CPUs contain only one MMX/SSE unit?

Consecutive load operations results problem

I amdoing consecutive load operations using _mm_loadu_si128() in my appl.. The two load operations using this instruction are using addressesas m1+len+h. First load operation uses xm1=_mm_loadu_si128(m1+16-1) , and second load operation uses xm2=_mm_loadu_si128(m1+16+0). I expect xm1 and xm2 to be similar except for the m128i_i8[15] when xm1 is shifted by left by 1. But, the result is something else. None of the 8-bit elements are same between xm1 and xm2. Is it something with memory address alignment; but _mm_loadu_si128() is supposed for non-aligned also.

adding or comapring 8 bit unsigned data in XMM register

Please can anyone suggest if there is any way to set, compare and add 16 unsigned 8-bit data elements of two XMM __m128i registers. What I am finding isthe signed operations and that everything after 127 is turned around and made to start from negative end. But I wish to do set, add and compare two __m128i registers having unsigned values more than 127.


Any suggestions on this is greatly appreciable.


Thanks.

Extracting XMM register elements gives compilation error

I have recently started with SIMD and SSE instructions coding. To access individual elements of the 128 bits register, I do like this xml.m128i_i8[i]. But it gives compilation error :-

1>.outline_correct_C.cpp(65): error: union "__m128i" has no member "m128i_i8"


1> if(xm1.m128i_i8[i])


1> ^


1>


1>.outline_correct_C.cpp(65): error: expression must have class type


1> if(xm1.m128i_i8[i])


But, I have checked similar uses in the manuals also. Can you please clarify.

How is the brandstring formed by BIOSes?

Hi,

I've been looking for this for quite some time now, but it seems nobody knows:
How does the BIOS form the brandstring which can be read from the CPU via the cpuid instructions 0x8000002h-0x80000004h?
It must be programable by the BIOS as I've seen BIOSes omitting the model of the CPU and at least for AMD CPUs there are tech docs available to the public describing how to form the brandstring.

Pages

Subscribe to Intel® AVX and CPU Instructions