which VTune event I should use to measure FSB bandwidth on clovertown?

which VTune event I should use to measure FSB bandwidth on clovertown?


I am wondering whether you can provide some tips on which events I should use to collect memory bus bandwidth utilization. I know on Pentium4 or Xeon platform, the FSB data ready event can be used, but not sure the corresponding events on Core 2 Duo or Core 2 Quad-core machine. Can you give me some suggestion? I read through some IDF materials about VTune, and saw BUS_Transaction_MEM_SELF.ANY is used for bandwidth. However, I also realized there are other events, such as BUS_DRDY_CLOCKS.mask and BUS_TRANS_ANY.mask. I am confused on using these events.

thx for your help.

3 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

By David Levinthal:
"..directly counting the number of bytes associated with the cachelines transferred:
Cacheline_Bandwidth (bytes/sec) ~ 64*BUS_TRANS_BURST.ALL_AGENTS*core_freq/CPU_CLK_UNHALTED.CORE"

very helpful to read full article by him "Analyzing and Resolving multi-core non scaling on IntelR Core2 processors" at http://www.devx.com/go-parallel/Door/33294

regards, Andrei


Many thx for the answer and the document. I will read through the doc. But here have further question, for the formula Cacheline_Bandwidth (bytes/sec) ~ 64*BUS_TRANS_BURST.ALL_AGENTS*core_freq/CPU_CLK_UNHALTED.CORE", If we run multithreaded application on Intel multi-core processor, the above formula should be multiplied by the number of cores I am using, right?

The second question is what is the difference between L2_LINES_IN.BOTH_CORES.DEMAND and L2_LINES_IN.SELF.DEMAND. For single core running, I can use L2_LINES_IN.SELF.DEMAND to measure L2 cache misses, but for two cores running or more cores (such as 2-socket core 2 quad core machine), which event I should use?



Leave a Comment

Please sign in to add a comment. Not a member? Join today