I have an memory intensive application running on a dual Xeon E5-2643 under 64bit Win7 Pro. The workstation BIOS has NUMA and hyperthreading enabled (windows thinks there are two NUMA nodes). I'm using VTune Amplifier XE 2013 Update 4 (build 270817)
My application allocates and initializes a big chunk of RAM on the main thread, then spins up TBB and does parallel_reduce on a blocked range2d to compute some stuff.
I'm interested in using vtune to get some insight into bandwith issues; in particular trying to get some assessment of how much the threads which run on cores on the "other" node to where the data is allocated are being impacted by having to go over the QPI bus to access the data.
The documentation (e.g http://software.intel.com/sites/products/documentation/doclib/stdxe/2013... , although I've also found the same thing in the locally installed help ) suggests there is a "QPI Bandwidth" viewpoint available. But I cannot figure out how to get to actually get this viewpoint to appear as an option!
I have run both "General Exploration" (with "Analyze Memory Bandwidth" ticked) and "Bandwith" analyses (from the "Sandy Bridge/..." section of "Choose analysis type") and neither of them seems to give me a "QPI bandwidth" viewpoint. Once run, the "General Exploration" analysis gives me "Handware Event Counts", "Hardware Event Sample Counts", "Hardware Issues", "Hotspots", "Bandwidth", "General Exploration" viewpoints, and the "Bandwidth" analysis gives me the same without the "General Exploration".
The "Bandwidth" viewpoint itself is quite interesting, but it only shows "package_0" (of the 2) as consuming any bandwidth, and the peak is about half of what I compute my whole program is achieving from the rate it processes data at (and Task Manager clearly shows it maxing out both NUMA nodes), so it appears that "Bandwidth" just shows a node's direct interaction with DRAM, and not any QPI traffic to the other node, or a node's traffic to DRAM originating from another node via QPI. (Googling this topic finds various references to "uncore events" being important, and that vtune doesn't necessarily deal with them that well for some architectures, but the documentation's mention of the "QPI bandwidth" viewpoint gives me some hope it can show me something useful about what "package_1" is doing.)
What's the trick to getting a "QPI Bandwidth" viewpoint to appear ?
Thanks for any help