Hi, I've looked in the opt guide, and it states that latencies for SP and DP fp ADD and MUL instructions is still 3 and 5 cycles.. but on ADDSS and ADDPS I measure a 4 cycle latency now, whereas on Intel SB it was 3. DP variants are still 3 cycles (ADDSD and ADDPD). Likewise.. I measure on MULSS and MULPS a latency of 6 cycles now.. whereas I only measured a latency of 5 before. DP is the same, 5 cycles as before. I am doing repetitive loops with lots of one instructions to determine throughput.. and latency is similarly determined but now with chained dependencies. So a chained dependency would be: addss xmm0,xmm1 addss xmm0,xmm2 addss xmm0,xmm1 addss xmm0,xmm2 ... where xmm1 and xmm2 are the negatives of one another. Thanks for any advice.. Perfwise
For more complete information about compiler optimizations, see our Optimization Notice.