measure the number of memory access

measure the number of memory access

Hello everybody,I would like to measure the number of memory accesses executed by my software (how many times my software access the RAM) using the hardware counters and vtune. The problem is there are a lot of counters used for memory measurement and I don't know which one suits me best. I even don't know clearly what is thepurposeof each one and the difference between them. Reading the help didn't help much . It would be nice if there are some tutorials of more detailed documents.I would appreciate any help or tips. Thank youAhmed

14 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

Thank you for your response. I meant the total number of data accesses to the memory including load and store instruction (if it is possible to measure the store instructions number of memory access). From another hand, it could be nice if I can also measure the number of LLC cache misses in Total.I have Nahalemarchitecture. I usedMEM_UNCORE_RETIRED.LOCAL_DRAM to measure the number of memory accesses, but the results I got are not reasonable. I am trying to measure the number of memory accessincurredby a software whichreceives packets from a Network interface (NIC) and send them through another NIC without any additional processing. I was expecting to have at 3 memory accesses per each packet but I got not more than 1 memory access.Another question, what is the difference betweenMEM_UNCORE_RETIRED.LOCAL_DRAM andOFFCORE_RESPONSE_0.DATA_IN.LOCAL_DRAM ???Thank youAhmed

>>...measure the number of memory accesses executed by my software (how many times my
>>software access the RAM)...

In case of a Windows platform you could also look at:

- Windows Management Instrumentation ( WMI )interfaces;
- Platform's SDK utility Pstat.exe;
- PerfToolexample with source codes located at:
\Samples\WinBase\WinNT
folder.

Best regards,
Sergey

Thank you Sergey, but actually I have a Linux platform so unfortunately, I cannot use any windows tools there. Actually it could be useful, if I give more details about my system, I have :
- Linux Centos 5.4 with no GUI- intel Xeon Nahalem architecture with only one CPU (four cores), just one CPU socket . so I don't have remote memory accesses- I use intel Vtune amplifier XE 2011.- My software fetches packets form the RX ring of the network card (using pointers, no copying) and move the packets to the TX ring, after the packet is sent out to the network, the memory buffer which were holding the packet will be recycled back to the RX ring (free memory operation and reassign pointers). Maybe this details is not enough to explain my software operations, but I hope you could get at least some impression about it . if you have any questions, please feel free to ask.Thank you againRegards,Ahmed

Quoting amego83Thank you for your response. I meant the total number of data accesses to the memory including load and store instruction (if it is possible to measure the store instructions number of memory access). From another hand, it could be nice if I can also measure the number of LLC cache misses in Total. I have Nahalemarchitecture. I usedMEM_UNCORE_RETIRED.LOCAL_DRAM to measure the number of memory accesses, but the results I got are not reasonable. I am trying to measure the number of memory accessincurredby a software whichreceives packets from a Network interface (NIC) and send them through another NIC without any additional processing. I was expecting to have at 3 memory accesses per each packet but I got not more than 1 memory access. Another question, what is the difference betweenMEM_UNCORE_RETIRED.LOCAL_DRAM andOFFCORE_RESPONSE_0.DATA_IN.LOCAL_DRAM ???Thank youAhmed

There are two meanings of memory access counting:
1) Count all memory accesses, includes: load data from memory (local memory or remote memory), load data from cache (local cache or remote cache). Same logic in your algorithm.
2) Only count about loading data from memory. Count number is less than in your algorithm

You said "From another hand, it could be nice if I can also measure the number of LLC cache misses in Total", that is thesituation 2 (Load LLC missed from loca and remote). So simply use event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS.

If my assumption is wrong (you meant situation 1):
Use MEM_UOPS_RETIRED.ALL_LOADS_PS, and MEM_UOPS_RETIRED.ALL_STORES_PS
(Please understand hardware events are overlapped in function. It means above two events cover: cache miss, memory load / store)

Regards, Peter

Thanks Peter, actually I am concerned with situation number 1 , I tried theMEM_UOPS_RETIRED.ALL_LOADS_PS, and MEM_UOPS_RETIRED.ALL_STORES_PS, but they don't work for me because I have Nahalem architecture and those two work only for Sandy bridge, Could you please tell me what the are their equivalents for Nahalem ?And please could you also provide the equivalents ofMEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS for Nahalem, I may need that later .Thanks a lotAhmed

For Nehalem processors, you can use:
MEM_INST_RETIRED.LOADS ;Instructions retired which contains a load
MEM_INST_RETIRED.STORES ;Instructions retired which contains a store

MEM_LOAD_RETIRED.LLC_MISS in Nehalem is equivalent to MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS_PS in Sand bridge.

Regards, Peter

Thanks Peter, I am so sorry, it seems like I misread your previous past (the one before the last one), I want to explain again my case in more clear words. I would like to measure the total number of access to the memory, I mean by the memory only the DDRAM, I am not concerned with the access at the cache level. I need the total of memory access including both the LOAD and STORE, but just only at the DDRAM level , no cache accesses.What I am looking for is provided in the oprofile tool using the BUS_TRAN_MEM event (number of completed memory transactions) ,in case someone is familiar with that.
I hope I made myself clear enough this time and thank you for your patience and your help.

Forcounting memory read and write back (whenLLC missesor write buffer is flushing.),
I suppose that you have to use this for Nahalem: BUS_TRANS_MEM.ALL_AGENTS

Actually , BUS_TRANS_MEM.ALL_AGENTS is for Core 2 Duo Processor, I tried to find something similar for Nahalem but I couldn't.Do you know any event for Nahalem which isequivalent for to BUS_TRANS_MEM.ALL_AGENTS in the Core 2 Duo Processor.
Thanks
Ahmed

No. There is no similar bus event, but for memory access (not in cache), my opinion is to use MEM_LOAD_RETIRED.LLC_MISS & MEM_RETIRED.DTLB_MISS instead.

Regards, Peter

Hi,I have a similar problem where I want to determine the arithmetic intensity of my program. Hence I need to get an estimate of FLOP/Byte. Therefore I wanted to useMEM_UNCORE_RETIRED.LCOAL_DRAMto get the dram access counter. Is this correct? And if yes, that would mean that I only need to multiply this counter by the cacheline (which is for i7?) to get an estimate of the total Bytes used by the program?cheers--yannick

DRAM access includes local dram and remote dram, use events MEM_UNCORE_RETIRED.LOCAL_DRAM and MEM_UNCORE_RETIRED.REMOTE_DRAM. Also you can simply use MEM_LOAD_RETIRED.LLC_MISS instead, whatever the counter was from local or remote.

Above DRAM access isnot to count memory access when data already is in L1-2-3 cache.

You have to use event FP_COMP_OPS_EXE.x87 to know FLOPS, anddivided by INST_RETIRED.ANY? Know average FLOPSper instruction?

Regards, Peter

Login to leave a comment.