I have a question about performance counters. I am counting the stall cycles coursed by memory access on Sandy Bridge. I believe the event OFFCORE_REQUESTS_OUTSTANDING.ALL_DATA_RD is a good measurement. However, it may ignore the overlapping among continuous load. So , the true stall cycles is smaller. Is there any events that taking overlapping into considerate？ Or how can i identify the overcounting. As far as i know, it can be done easily by MBA in AMD's processor. But there is not a MBA in Intel's processors.
Could anyone help me?