I was measuring the load latencies in my core i5 2450 using PEBS + load latency feature of the PMU. The document says , latency value is measured in core cycles from micro-operation (uop) dispatch to when data is globally observable (GO). I have the following queries
1. What is meant by 'when data is globally observable' ?
2. The L1 latency is around 4-5 cycles. But when I measure it using the PEBS, I get L1 hit latency values of even >100 cycles also. ( It is a hit in dtlb/stlvb)
Can some one give a clarification