Store Forward Block

Store Forward Block

Michael Chynoweth (Intel)的头像

TITLE: Store Forward Block

ISSUE_NAME: STORE_FORWARD_BLOCKED

DESCRIPTION:  A store forward block describes a situation when a recent store is unable to forward to a load.  If a load follows a store and reloads data that the store has written to memory, Intel microarchitectures can in many cases forward the data directly from the store to the load.  This "store forwarding" saves cycles by enabling the load to directly obtain the data instead of accessing the data from cache or memory.  A "store forward block" occurs in cases that a store cannot be forwarded to the load.  The rules of store forwarding are complex and involve factors of size, alignment, and type of store/load.  The optimization guide describes conditions for store forwarding.  PBA has 2 different methodologies for finding cases when store forwarding is blocked:

1) We use static assembly analysis which finds the store and the corresponding load.  

- This finds ~50% of all cases when store forwarding cannot occur and is useful to identify cases when a store cannot be forwarded due to cases of size or alignment.  This methodology has the advantage that it identifies the load and the corresponding store.

2) We use the performance monitoring event to identify when the architecture indicates a store cannot be forwarded to the load.  

- In these cases we can usually identify the load but not the store.

RELEVANCE:  Store forward block impacts all Intel architectures on all OS's.  Intel Atom microarchitecture runs into more cases where store forwarding is not allowed than the Intel Core microarchitecture.  Store forwards which are blocked typically incur a performance penalty of 10-15 cycles.

EXAMPLE: 
The most typical case of store forward block on Intel Core microarchitecture that a small store cannot be forwarded to a large load.  This will also be a store forward blocked on Atom microarchitecture.

mov word ptr [eax], 1  //Small store to 16-bits (PBA will mark this as STORE_FORWARD_BLOCKED_START)

mov  ecx, dword ptr [eax]  //Large load of 32-bits at the same address stored by previous instruction (PBA will mark this as STORE_FORWARD_BLOCKED)

SOLUTION:

Fixing the store forward blocked is usually done at the store.  First determine whether the store is necessary or can the variable be manipulated in registers.  Look at the size, alignment and type of store and attempt to determine why it is not being forwarded.

RELATED_SOURCES: 

The "Intel® 64 and IA-32 Architectures Optimization Reference Manual" has the forwarding rules for each architecture.

1 条帖子 / 0 new
如需更全面地了解编译器优化,请参阅优化注意事项