I have a "simple" question concerning Intel64 microarchitectures (Nehalem and newer) :
I would like to know precisely the number of clock steps a particular sequence of machine instructions requires from start to finish in the core's pipeline.
I understand that these cores are superscalar and thus the particular instruction sequence may get mixed with other possibly unrelated instructions in the out-of-order execution engine.
How precise may I get to collect exact timestamps as close as possible to the clock cycle the 1st instruction enters the pipeline for decoding through the clock cycle the last instruction commits back to visible state ?
I am assuming the following: in the core under observation only a particular thread may run (say I have bound it there and all other threads, including kernel ones, are bound to other cores).
Can I avoid handling external interrupts by this core to minimize external interference / contamination of the pipeline by unrelated instructions ? There are platforms on which h/w interrupts can be routed to specifc cores only and avoid others. As for dispatching the single thread on that core I could use the SCHED_FIFO RT sched class with high enough priority and let this thread run to completion.
Under the above conditions, I could supply a long enough sequence of NOOPS to fill out completely the pipeline then
getTimeStamp_1 ;; TS begining of instr sequence
ins1; ins2; ins3 .....; insk
getTimeStamp_2a ;; TS end of instr sequence pre-flush
CPUID ;; flush microps out
getTimeStamp_2b ;; TS end of instr sequence post-flush
Any suggestions or comments would be appreciated