I am using the PMU to samples load events with a latency above a given threshold. In this context I am trying to understand the meaning of the different Data Source Encoding for Load Latency Record described in Table 18-13 of Intel® 64 and IA-32 Architectures Software Developer’s Manual
My first question is about snooping for the L1 and L2 caches. In this table, there is no different events for cache hits in L1 and L2 regarding snooping where as differences exist for L3 hits. How the cache coherency between two cores' L2 private caches (of the same processor package, and for different ones) is ensure if no snooping is done ?
The second question is about the difference bewteen the:
0x4 record: "L3 HIT. Local or Remote home requests that hit L3 cache in the uncore with no coherency actions required (snooping)",
the 0x5 one; "L3 HIT. Local or Remote home requests that hit the L3 cache and was serviced by another processor core with a cross core snoop where no modified copies were found. (clean)."
and 0x6: "L3 HIT. Local or Remote home requests that hit the L3 cache and was serviced by another processor core with a cross core snoop where modified copies were found. (HITM)."
My understanding is that for 0x4 no snooping request is sent at all, because the hardware knows (no matter how for my understanding) that no other cores (in the same or another processor package) has cached the data. For 0x5 a snooping request was sent, but all the other copies were not modified. In this case the doc speaks about cross core snoop, I am wondering if a core snoop is INTRA package only of can be INTER package ? For 0x6 my understanding is that a modified copy was found, and as consequence the local cache had to get this copy from the other cache having the copy. Are these interpretation correct or not ?
I guess that all these questions are related to my very limited knwoledge of the cache coherency protocol, so please feel free to point me to the Intel document I should read on the subject.