Hi, I've looked in the opt guide, and it states that latencies for SP and DP fp ADD and MUL instructions is still 3 and 5 cycles.. but on ADDSS and ADDPS I measure a 4 cycle latency now, whereas on Intel SB it was 3. DP variants are still 3 cycles (ADDSD and ADDPD). Likewise.. I measure on MULSS and MULPS a latency of 6 cycles now.. whereas I only measured a latency of 5 before. DP is the same, 5 cycles as before. I am doing repetitive loops with lots of one instructions to determine throughput.. and latency is similarly determined but now with chained dependencies. So a chained dependency would be:addss xmm0,xmm1addss xmm0,xmm2addss xmm0,xmm1addss xmm0,xmm2...where xmm1 and xmm2 are the negatives of one another.Thanks for any advice..Perfwise
For more complete information about compiler optimizations, see our Optimization Notice.