Bad Speculation (Back-End Bound Pipeline
Slots)
Metric Description
Superscalar processors can be conceptually divided
into the 'front-end', where instructions are fetched and decoded into the
operations that constitute them; and the 'back-end', where the required
computation is performed. Each cycle, the front-end generates up to four of
these operations placed into pipeline slots that then move through the
back-end. Thus, for a given execution duration in clock cycles, it is easy to
determine the maximum number of pipeline slots containing useful work that can
be retired in that duration. The actual number of retired pipeline slots
containing useful work, though, rarely equals this maximum. This can be due to
several factors: some pipeline slots cannot be filled with useful work, either
because the front-end could not fetch or decode instructions in time
('Front-end bound' execution) or because the back-end was not prepared to
accept more operations of a certain kind ('Back-end bound' execution).
Moreover, even pipeline slots that do contain useful work may not retire due to
bad speculation. Front-end bound execution may be due to a large code working
set, poor code layout, or microcode assists. Back-end bound execution may be
due to long-latency operations or other contention for execution resources. Bad
speculation is most frequently due to branch misprediction.
Possible Issues
A significant proportion of pipeline slots are remaining empty. When
operations take too long in the back-end, they introduce bubbles in the
pipeline that ultimately cause fewer pipeline slots containing useful work to
be retired per cycle than the machine is capable of supporting. This
opportunity cost results in slower execution. Long-latency operations like
divides and memory operations can cause this, as can too many operations being
directed to a single execution port (for example, more multiply operations
arriving in the back-end per cycle than the execution unit can support).