TLB misses

TLB misses

Hello,I'm trying to measure TLB misses with the following counters:DTLB_LOAD_MISSES.ANYMEM_LOAD_RETIRED.DTLB_MISSThe second one gives more misses than the first one. And also the first one gives more misses (approximately 2 times) than the expected misses. What can be the possible reasons? Is the first one counting 2 times per miss for first level miss and second level miss? The machine I'm using is Xeon L5520. Any help is appreciated.Cheers,

publicaciones de 4 / 0 nuevos
Último envío
Para obtener más información sobre las optimizaciones del compilador, consulte el aviso sobre la optimización.

Hello cagribal,
I ran a test to check the counters.
The test is a 'read memory bandwidth' test.
I start 1 thread/cpu and each thread reads a 40MB array using a 64 byte stride for 10 seconds.
I would expect 1 DTLB miss per page. Each page is 4096 bytes.
It takes 64 loads to cover a page (64 loads = 4096 page size/64 stride).

Here is what I counted in 1 of the 10 seconds.

DTLB_LOAD_MISSES.ANY	556,938	556,991	560,425	556,418

MEM_LOAD_RETIRED.DTLB_MISS	532,471	513,618	524,524	526,461

MEM_INST_RETIRED.LOADS	37,694,658	38,354,887	36,165,843	34,506,563

UNC_LLC_LINES_IN.ANY	133,367,850

DTLB_MISSES.WALK_COMPLETED	558,674	558,890	565,344	559,734
a. loads/DTLB_miss, row3/row1	67.68	68.86	64.53	62.02

b. loads/DTLB_miss, row3/row2	70.79	74.68	68.95	65.54

c. loads/DTLB_miss, row3/row5	67.47	68.63	63.97	61.65

d. LLC_misses/DTLB_miss, row4/sum(row2)	63.60

e. loads/LLC_miss, sum(row3)/row4	1.10


The raw data is in rows 1-5.
I compute how many loads/DTLB_miss in rows a-d.
The loads/dtlb_miss is close to the expected 64. I ran the test on my work laptop which has tons of stuff running on it.
Row d. shows the LLC (Last level cache) misses / dtlb_miss. This is very close to 64 and is probably the best measure (since most of the LLC misses are due to my read memory bw test case).

So... in conclusion... I don't see overcounting. Certainly not 2x times too many DTLB misses.
Can you tell us more about your expected count and methodology?

Pat

Hello Pat,

The test confused me. There is much I can't understand.

a. loads/DTLB_miss, row3/row1   67.68   68.86   64.53   62.02

b. loads/DTLB_miss, row3/row2   70.79   74.68   68.95   65.54

c. loads/DTLB_miss, row3/row5   67.47   68.63   63.97   61.65  

The first column is the same, but the second column is different. I don't know why is this.

And what's the test's program? could I get the code? I want to do it by myself.

GHui

--GHui

Hello GHui,
Yeah, the table is not so clear.
Here is what it should look like:

                                core0   core1   core2   core3

a. loads/DTLB_miss(row3/row1)   67.68   68.86   64.53   62.02

b. loads/DTLB_miss(row3/row2)   70.79   74.68   68.95   65.54

c. loads/DTLB_miss(row3/row5)   67.47   68.63   63.97   61.65  

So all 3 rows are "loads/DTLB_misses" but computed from different quantities.

The test program is my 'id_cpu' utility. I don't have approval to release it.
But it should be relatively easy to reproduce the results with any 64 byte stride (justtouch eachcache line), read memory, with a 40 MB array.
Pat

Inicie sesión para dejar un comentario.