- Looking at the below link, what exactly is the difference between non-precise and precise monitoring? Might seem a silly question- but why would anyone ever use unprecise monitoring??
I have implemented some code using two approaches. I am looking at the results (attached) and I can tell that the "faster" version had less branch mispredictions, less L1 instruction cache misses, less TLB misses but I cannot calculate how many CPU cycles were consumed. The total difference between the two designs is several billion instructions.
Could somebody please glance at my results and assist me in how I can determine where the "additional" CPU cycles were consumed?
These are the memory access costs I have found:
I just installed the VTune Amplifier XD (2013) evaluation edition (I have had a license key ordered but I have not yet received it).
I am trying to find out the L1, L2 and LLC instruction and data cache misses of the application. I am using Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz. Can you please tell me how can I find instruction and data misses using intel vtune.
Secondly, if machine has hyper-threading active, should it be a good idea to turn it off to characterize the application.
I would like a function that would provide me the timestamp vTune uses in its timeline, so that any program trace/output can be tagged with this timestamp for a more comprehensive analysis (e.g.., I would like to know how many tasks a worker thread still have at a particular timestamp in the vtune timeline view)..
Dear Vtune experts,
I am trying to gather hardware counter information for individual functions in a code. While this seems straightforward in GUI, I haven't found a way through command line interface.
In the command line, I tried two reports:
(1) amplxe-cl -R summary -r r012ge
This gives me hardware counter information (instruction count, cache hit rate) for the entire code, but not for specific functions, which is what I need.
(2) amplxe-cl -R hotspots -r r012ge
I'm collecting hotspots on SUSE 11.3 with the following command:
amplxe-cl -collect hotspots -duration 200 -run-pass-thru=--no-altstack -result-dir /cores/results/socket_sleep_0 -target-process nsfw-1-2-3 --search-dir sym:r=/cores/gglibc/