I've been looking through forum threads related to RDTSC accuracy discussions. For the purposes of discussion, let's assume our assembly code looks like this.
mov esi, eax
sub eax, esi
; eax now has time difference between the first and second rdtsc
I'm curious about why the instruction latency is as long as 100 cycles on P4 and 60 on Xeon 51xx architectures. If it is true that RDTSC is not serializing (as mentioned in the Intel Software Developer's Manual), why should this take that long? Some potential explanations I gleamed from reading the other posts is that this might be due to:
(1) long sequence of microops of RDTSC
(2) resolution being limited by bus speed
(3) "synchronization of pipeline" But I thought the manual expressely said that was not happening?
Any thoughts as to which is the dominant? Perhaps these are not the real reasons? I'd appreciate any help on this.
Thanks in advance...