1) The load from the “pop” instruction will be reissued incurring ~10 cycle hit in performance as described in the blog, “Avoid Short Functions on Atom”
2) Calls and returns will not match fooling the branch prediction algorithm and will likely mispredict on the next return instruction
call NextIP: //Calling the next instruction which in this case will be just 1 byte away!
pop ebx //AHA! I now have the instruction pointer in ebx and can use it for the forces of evil…
//or just to produce position independent code. :)
//Notice there is no matching return here
The opcode to search for to identify a “zero length call” is (E8 00 00 00 00).
The diagram below shows what we call a “stream of instructions” which presents instructions in the order they were most commonly retired on the core (x-axis) graphed against total clocks tagged to each instruction (in red 1st y-axis) and branch mispredicts (in blue 2nd y-axis). The “zero length call” causing the reissued load show up in “spike1” below which in the asm below is just a call to the next instruction. The “zero length call” causes the next return to mispredict causing a large count of branch mispredicts to show up in the stream. Our toolset predicts branch mispredicts caused by the “zero length call” are costing 21% of this total “stream of instructions” while the “short function call” is estimated at 16%.