Recently, I tried to measure the read bandwidth of L1, L2 or L3 on my BroadWell CPU.
My method is to control the buffer size, 16KB for L1, 128KB for L2 and 1M for L3. The result seems to be reasonable on one CPU core, 50GB/s for L1, 44Gb/s for L2 and 25GB/s for L3.
At the same time, I use Intel PCM to measure the cache misses from all levels. For the case with 128KB buffer size, the number of L2 cache misses is much more the number of cold misses. So I guess the replacement policy of L2 cache is not LRU.
Could anybody comment on it?