Average Bandwidth on Xeon Machine

Average Bandwidth on Xeon Machine


I am running the bandwidth analysis on the xeon machine using intel vtune. 

The summary of the result shows average bandwidth 

Average Bandwidth
Package    Bandwidth, GB/sec
package_0    6.718
package_1    7.657

Can you please explain me what package refers here? If application is running with single thread, which package value I should pick for the bandwidth.



8 posts / 0 new
Last post
For more complete information about compiler optimizations, see our Optimization Notice.

You need to observe both, not only for packages but also for cores.

Even you run a single thread application, which work on mores cores unless you use processor-affinity function.

You can see how the threads of your application migrated over packages/cores changing viewpoint to "Hotspots" and timeline grouping to "Thread/HW Contest". If you have 2 packages with 4 cores each VTune will list them cpu_0,...cpu_3 for the first package and cpu_4..cpu_7 for the second one.

So if I have to tell the average bandwidth of the application. I will be taking average of above two packages' values.

Package is collection of physical cores.

Regarding your last post #4:

I think only if your application threads will be scheduled to run on the second package.


Thank you so much for the explanation.

You are welcome.

The average bandwidth is the sum of the values for the two packages -- not the average of the averages.

Even a single thread can generate memory traffic on both packages in several ways:

  1. The thread might move from one package to another and instantiate memory pages while running in package 0 and while running in package 1.
  2. The NUMA memory setting for the job (or for the system) might request interleaved pages.  This would usually result in approximately equal bandwidth utilization on the two packages.  (The values above are within 15% of each other, so interleaving is plausible.)
  3. The process might request more memory than is available in the package that it starts on, so that additional memory will be allocated on the other package.  You can monitor this on Linux systems by running "numastat" before and after your job and looking for large increases in the "numa_miss" output.

It is also possible that some of the memory traffic is due to other processes or to operating system activity.  The values here seem too high to blame on OS activity, but it is theoretically possible.   It is not generally possible to assign DRAM traffic to particular processes, so you need to ensure that you are running on an otherwise idle system to get reliable measurements.

John D. McCalpin, PhD
"Dr. Bandwidth"

Leave a Comment

Please sign in to add a comment. Not a member? Join today