I'm having problems achieving the 64 bytes/cycle L2 to L1 cache bandwidth on the Haswell core. I can only achieve 32 bytes/cycle. Does anyone know if this is possible? Several other reviewers on the web are having the same problem. Was this a typo on the Intel slides for Haswell? My current code is using integer loads. Is the claimed 64 bytes/cycle dependent on SSE loads? I'm stumped.
L1D, L2, and L3 cache bandwidth are all up on our Core i7-4770K compared to my preview piece. However, only the L1D yields the doubling of bandwidth we were expecting. Given 64 bytes/cycle (compared to 32 for Ivy Bridge), this number should still be much higher than it is.
We’re surprised that the L2 cache is just 7% faster on Haswell, but the massive L1 bandwidth is a huge jump.