1) The new P4/X OptRefMan states that L1 non-int latency is now 9 instead of 6; this makes it higher than L2's 7 cycle latency... Does that mean that L1 hits are slower than L1 misses???
2) RCPPS instuction listed in App C - lat/thr table - has it's execution unit listed as MMX_MISC; however, it cannot be issued in the cycles before and after some instructions, e.g. xorps (MMX_ALU). Does that mean that:
a) MMX_MISC is actually using other units?
b) this is caused by microcode transition or other way not connected to execution units?
c) MMX_MISC blocks other units?
3) I think there is a misprint in the P4/X OptRefMan in the App C, SSE2DPFP, page C-8, 4th line in the table: it states MULSS. I think it should be MULSD.
BTW, big thanks to your team for correcting divps/-ss/sqrtps/-ss latencies and adding new entries in the lat/thr tables and the HTT.