FP Single Precision Packed SIMD operations in DP code

FP Single Precision Packed SIMD operations in DP code

Hi,

This is an excerpt from a VTune (for linux) sampling activity with
'Packed Single-precision Floating-point Streaming SIMD Extension Instructions Retired' and 'Clockticks':


0x1eef6 219 1842 5006 for ( k=0; k "smaller than" n; k++ )
220 0 0 {
0x1efd9 221 64438 12381 sre+=(m1r[k]*m2r[k]+m1i[k]*m2i[k]);
0x1f002 222 13996 4692 sim+=(m1r[k]*m2i[k]-m1i[k]*m2r[k]);
223 0 0 }
0x1f3cc 224 0 41 rr[j]=sre; ri[j]=sim;

^
SIMD
^
Clockticks

(view in monospace font to make sense of it).

k,n,j are integers, the rest is double scalars or pointers - no
float involved. Nevertheless, VTune counts abovementioned events,
and they make up for a significant fraction of the overall FLOPs
count (measured by another tool by summing up x87, packed &
scalar SP & DP SIMD).

So what is going on here?

TIA,
Georg.

1 envío / 0 nuevos
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.