vtdpview read timing information

vtdpview read timing information

Bild des Benutzers zmajo

Hi,

I would like to do a time-series analysis of the memory behavior of some programs. For this purpose I invoke Intel PTU 3.2 in the following way:

./vtsarun -dl -ec "MEM_LOAD_RETIRED.LS_MISS":sa=100 -- .

I transform the resulting data with vtdbview into the vtune.db file which has sqlite3 format. As I would like to have the timing information, I thought of reading the contents of the EventSamples table in the database, because this has also a field called walltime. However, this table doesn't contain any data (as if the vtdbview tool wouldn't have converted _all_ the data into the sqlite format). Could you please tell me what I might be doing wrong?

Many thanks.

Zoltan Majo

7 Beiträge / 0 neu
Letzter Beitrag
Nähere Informationen zur Compiler-Optimierung finden Sie in unserem Optimierungshinweis.
Bild des Benutzers julia-fedorova (Intel)

with above command line you collected only cache misses - no any other information.
(also sampling after value ("sa") is too small)

but regardless - vtsaview and vtdpview reports samples and events information - not time

run
vtsarun -start --
vtsaview
you will get 2 basic events. From CPU_CLK_UNHALTED.CORE you can evaluate time.

read user guide document and another .pdf-s docs that explain how to use the tool

Bild des Benutzers zmajo

Hi,

thanks for the quick response.

I think MEM_LOAD_RETIRED:L2_MISS is a precise event, so I also collected program counter and register values amongst others with the PTU tool. The graphical interface of the tool summarizes and interprets the samples that have been collected, and doesn't give me access to the collected data itself (which I would like to process and interpret myself). That is why I tried using the command-line utilitities to get more data. The vtsarun tool itself stores the precise data of the samples in a file *pebs, but that is in Intel proprietary format. I assumed that the visualization tools (vtsaview, vtdpview) transform this data into a more open format (sqlite), where the full data is available. But this doesn't seem to be true, Could I get access to the raw samples that the tools have gathered? I assumed that the 'EventSamples' table in the database would contain this data, but no matter how I try to convert the raw data, the table stays empty.

Thanks for your help.

Regards,

Zoltan

Bild des Benutzers julia-fedorova (Intel)

Zoltan,
yes - MEM...L2_MISS is precise event and the tool doesn't collect fixed counters events (CLK and INST_RETIRED, while it could) when explicitly asked to collect one event.

You undertood the logic of the tool right.
vtsaview and vtdpview aggreagate samples and put them into vtune.db. (formst of which we do not explain )

everything that shown in GUI could be retrived from command line. GUI takes data from vtune.db running vtsaview and vtdpview. (run "-help" )

With the current version - you can not get raw samples. Sorry.

might be there will be an update.

Bild des Benutzers zmajo

Hi Julia,

thanks for the explanation.

Having raw samples would be indeed nice. I'll check the updates of the tool.

What I would also consider very interesting is the distribution of memory accesses in time. I understand this as follows: given a specific moment during the runtime of the application, what address(es) or blocks of memory were accessed by which threads. Do you consider to incorporate something like this into future version of PTU?

Best regards,

Zoltan

Bild des Benutzers julia-fedorova (Intel)

Hi Zoltan,

distribution of the memory accesses in time is interesting but the thing is what do we want to do with them.
Also provided that we collect with sampling - the picture will be very sparse and only glaring pathologies could be seen there (?)

i can not comment on PTU.

if you interested - look for for instrumentation tool - Pin (also from Intel) - it is free for research and _very_ cool. (may be you already uses it) with it you will generate memory trace.

Bild des Benutzers zmajo

Hi Julia,

well, one could try different sample-after values to get a more complete picture of the memory behavior of the program. And I assume that hardware performance profiling could be still much less intrusive than an instrumentation-based scheme.

Thanks for your recommendation regarding PIN. I briefly looked at it, but I was not sure whether it works also for multithreaded programs without much trouble. I'll take another look then.

Best regards,

Zoltan

Melden Sie sich an, um einen Kommentar zu hinterlassen.