Memory Bandwidth
Metric Description
This metric represents a fraction of cycles during which an application could be stalled due to approaching bandwidth limits of the main memory (DRAM). This metric does not aggregate requests from other threads/cores/sockets (see Uncore counters for that). Consider improving data locality in NUMA multi-socket systems.
Possible Issues
A significant fraction of cycles were stalled due to to approaching bandwidth limits of the main memory (DRAM).
Tips
Improve data accesses to reduce cacheline transfers from/to memory using these possible techniques:
- Consume all bytes of each cacheline before it is evicted (for example, reorder structure elements and split non-hot ones).
- Merge compute-limited and bandwidth-limited loops.
- Use NUMA optimizations on a multi-socket system.
Software prefetches do not help a bandwidth-limited application.