The May 2018 Combined SDM, Chapter 19, Section 2 and Section 6 list the performance counters for skylake and haswell, respectively.
Under section 2 you will find the following 8 events:
CDH 01H MEM_TRANS_RETIRED.LOAD_LATENCY_GT_2
CDH 01H MEM_TRANS_RETIRED.LOAD_LATENCY_GT_4
CDH 01H MEM_TRANS_RETIRED.LOAD_LATENCY_GT_8
CDH 01H MEM_TRANS_RETIRED.LOAD_LATENCY_GT_256
CDH 01H MEM_TRANS_RETIRED.LOAD_LATENCY_GT_512
Their description reads: "Counts loads when the latency from first dispatch to completion is greater than <X> cycles." for the correspoding value of X; 2, 4, 8, etc. In particular, there is no indication in the description that these counters measure randomly sampled memory loads. In fact, as stated I would expect a precise count of these events up to skidding in perf record.
Under section 6, among others, you will find:
Number Value Event Mask Mnemonic Description
CDH 01H MEM_TRANS_RETIRED.LOAD_LATENCY Randomly sampled loads whose latency is above a user defined threshold. [Specify threshold in MSR 3FAH]
My question is: Can the "MEM_TRANS_RETIRED.LOAD_LATENCY" be used to emulate the former 8 performance counters showing up for Skylake, or are the semantics as stated in the description correct thus prohibiting this emulation by proxy?
I am aware that the Events and Umask are the same, but I am unsure if the implementation of these in hardware are consistent across haswell and skylake. I would like to get an official answer from Intel.