Quad precision ?

Quad precision ?

Hi,

I am a bit lost. I try to find information on how the "quad-precision" (REAL*16) is implemented on new Intel CPU (like Xeon 54xx) and Intel 10.x compiler...

It is hardware-supported or only software supported (or a mix of the two)?

What is the accuracy (in digit) we can expect?

What kind of performance we can expect in comparison to typical a double-precision (Linpack for example) ?

How the Intel CPU compare in quad precision with the IBM POWER6 architecture?

For example, a quote from the POWER6 description:
"[On Power6 ... ]The unit is effectively quad precision, offering up to 36 digit
accuracy in 144 bits, although results are compressed to 128 bits to
fit in two floating point registers and then decompressed before
consumption. Basic operations are somewhat slower than ALU operations,
with single cycle throughput, but 2 cycle latency."

thanks,

tux456

3 posts / 0 nouveau(x)
Dernière contribution
Reportez-vous à notre Notice d'optimisation pour plus d'informations sur les choix et l'optimisation des performances dans les produits logiciels Intel.

Quad precision would be implemented on Xeonby combinations of x87 "REAL*10" operations, so at least 2 non-vectorizable instructions would be required to implement each floating point operation. For most operations, you should get 48 bits additional beyond the x87 precision, thusabout 33 decimal.

Comparing linpack performance doesn't make much sense, except to emphasize that you require roughly 5 operations per floating point add and multiply, plus packing and unpacking time, as well as losing a factor of say 2 by no vectorization.

I haven't seen any documentation indicatingthat Power6 would have changed the floating point format from that which previous IBM and MIPS architectures used, which supports approximately 107 bits or 31 decimal, with exponent range reduced in comparison with REAL*8. In effect, 11 bits are wasted, due to carrying 2 copies of the exponent, differing by a constant. Of course, those implementations should penalize performance by only a factor of 3 or so, compared with non-vector REAL*8.

One of the design parameters for Itanium is full instruction level support for quad precision, possibly making it a superior platform for that purpose. Needless to say, that advantage hasn't proven decisive in the marketplace.

Thank you very much for your answer.

It's unfortunate that we realise to late that the Itanium2 can be usefull!

Connectez-vous pour laisser un commentaire.