basic data access profiling using the load latency event

basic data access profiling using the load latency event

Hi,

I did some experiments with the load latency event of the Intel Nehalem (MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD). The machine I'm doing the experiments on has two Xeon E5520 processors. As I'm mostly interested in high latency DRAM accesses, I thought that by setting the threshold to a value larger than the latency of the on-core caches, I would mostly get samples with DRAM operations. To my surprise, the percentage of off-core samples doesn't substantially increase with large thresholds. The table below shows the results:

Threshold | SAV | DRAM accesses [%]
=======================================
0 | 2'000'000 | 4.0%
32 | 5'000 | 5.6%
64 | 2'000 | 8.1%
128 | 1'000 | 15.0%
256 | 500 | 14.3%

It seems that even for a threshold value of 128 or 256, the proportion of on-core accesses is quite large (around 85%). Is it normal that on-core accesses have such a large latency? What is the cause for this?

Many thanks and best regards,

Zoltan Majo

5 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.

Hi Zoltan,

(sorry for delay with the answer - was OOO)

It is tricky event (different from other ones). It samples the accesses, tracks the latency of sampled access and if latency bigger then threshold - it continues to track it and capture the source; otherwise discards the access and move to the next sample.

And it is not expected that many memory accesses served by RAM at the moment of access - many of those that in not cache - usually shown as served by line fill buffer (LFB) (meaning that data are already on the way at the moment they requested ) - as locality principle is here and all prefetchers are working

Here is the table comparing data for MEM_INST_RETIRED.LATENCY_ABOVE_THRESHOLD_32 and 256
(for some workload on single socket Nehalem)

Normal
0

false
false
false

MicrosoftInternetExplorer4

-->

Normal
0

false
false
false

MicrosoftInternetExplorer4

/* Style Definitions */
table.MsoNormalTable
{mso-style-name:"Table Normal";
mso-tstyle-rowband-size:0;
mso-tstyle-colband-size:0;
mso-style-noshow:yes;
mso-style-parent:"";
mso-padding-alt:0in 5.4pt 0in 5.4pt;
mso-para-margin:0in;
mso-para-margin-bottom:.0001pt;
mso-pagination:widow-orphan;
font-size:10.0pt;
font-family:"Times New Roman";
mso-ansi-language:#0400;
mso-fareast-language:#0400;
mso-bidi-language:#0400;}

Thr SAV Events LFB DRAM RET.LOADS MEM_UNCORE.DRAM %_real_DRAM_Accesses

32 5000 73,000,000 36% 5% 1.07E+12 1.74E+08 0.02%

256 500 7,132,500 68% 17% 1.07E+12 1.76E+08 0.02%

and also collected precise MEM_RET_LOADS and MEM_UNCORE_DRAM

So it could be seen that threshold =32 is more reliable to evaluate real fraction of DRAM accesses (at data request).
For those with threshold 256 - their the big fraction comes from LFB. in this case latency is big but in the reality data are on the way.

??

julia

Threashold
SAV
Events
LFB
DRAM
MEM_INST_RETIRED.LOADS
MEM_UNCORE.DRAM
% of real DRAM accesses

32
5000
73,000,000
36%
5%
1.07E+12
1.74E+08
0.02%

256
500
7,132,500
67.60%
17%
1.07E+12
1.76E+08
0.02%

Hi Julia,

Thanks for your answer. I'm also a bit late replying to you, because for some reason I didn't get notified of your post.

Now I understand the anomaly, thanks for the explanation. In tool whose output you have shown in your post can report the number of accesses served by the LFB. PTU is not able to show this percentage yet, isn't it?

Best regards,

Zoltan

Hi Zoltan,

you are right - PTU doesn't show that percenatge and doesn't plan to show this.
This percentage i calculated using MEM_RET_LOADS -
to find our a kind of a trend - a ratio of reported data source through LATENCY event to all LOADS.

But PTU does show the number of samples/events by data source (including LFB) in Data Profiling view - so raw LFB data you are able to see - right?

best,
julia

Hi Julia,

Yes, you are right. I thought you had collected also the LFB-related information with the load_latency event (I overlooked that you have used the MEM_RET_LOADS event as well).

Thanks!

Best regards,

Zoltan

Kommentar hinterlassen

Bitte anmelden, um einen Kommentar hinzuzufügen. Sie sind noch nicht Mitglied? Jetzt teilnehmen