We are getting about 23% better single-thread performance from Haswell over Ivy Bridge at the same clock speed on our server workload. Running VTune General Exploration I found that Haswell produced 1/4 of the Icache misses produced on Ivy Bridge. The number of branch mispredictions was about the same (and fairly low for a server app with few small loops). Since both processors have the same size top-level Icache, what is the explanation? In Intel advertising literature I see that Haswell "Initiates TLB and cache misses speculatively" and "Handles cache misses in parallel to hide latency", but no further specifics on Icache changes.
Since I always look gift horses in the mouth, does anyone have an explanation? Are there any other VTune counters I can use to shed some light on Haswell's remarkable performance improvement?