I'm new in computer architecture, I have a question may look silly.
I use lmbench's bw_mem tool to test mem bandwidth. my testbed is a 4 sockets(E7-4870) system populated 64 8GB dimms.
Using numactl to bind node 0 read node 1's memory. the test result is 11GB/s. And I using PCM to monitor CPU activities. Found L3MISS is 88M and MC READ is 12.53GB.
IMHO the llc misses should equals to memory access, so 64B(cache line size) * L3miss should equal to mem bandwidth. But the test result is 64B * 88M = 5.632GB << 12.53GB, what am i missing here?