I've been looking at the latency of LDs upon my IB. I've noted that in the opt guide there's discussion as to the effect displacement size has upon LD latency. So I decided to test that, and to my best efforts I find that you can only get 4 clks of LD latency if you DON'T have a displacement. As soon as a displacement is added I find that I observe 1 extra clk in latency, which is counter to what the optimization guide states. The test is actually quite simple, create a pointer chase, accessing the next additional 8 bytes from the current, or whatever strikes your fancy, and then offset the address at each jump which is loaded by some number of bytes (which you correct in the load of that point chase via a displacement).
To the best of my knowledge.. it appears all loads which have a displacement are 5 clks. Can someone from Intel comment as to this and if it's correct clarify/correct it in the opt guide.