AGEN_STALL

AGEN_STALL

Аватар пользователя Erik Niemeyer (Intel)

TITLE: AGEN STALL
ISSUE_NAME: AGEN_STALL
DESCRIPTION: In the Intel(r) Atom(c) Microprocessor (Bonnell), the processor does its address generation at the AG (stage 10), while most integer operations are done at the EX1 (stage 13). So, if a uop which uses the AG stage requires data from the EX1 stage, typically a 3 cycle penalty will be incurred since the dependent instruction can’t enter the AG stage until the data producer leaves EX1. There are four basic operations that are processed during the AG stage that define the AGEN stall sub-types: 1) Load and Store address calculation dependencies, 2) Implicit ESP update dependencies, 3) The LEA instruction, and 4) Int->float transfers.
RELEVANCE: All first-generation Atom cores based-on the Bonnell microarchitecture.
EXAMPLE: The two most common sub-types are types 1 and 3. Below are examples of each issue:
---
Example 1: Load and Store address calculation dependencies
-This is the most basic form of an AGEN stall.
-The following code sample illustrates this condition:
ADD ESI, 4
MOV EAX, [ESI]
-This will result in a three cycle bubble, since the MOV instruction cannot progress from the AG stage until the ADD instruction’s EX1 stage is completed.
Frequency: Pervasive in all client code
Costs: Any load/store address calculation dependency will cost 3 cycles
---
Example 2: The LEA Instruction
-The LEA instruction executes during the AG stage to support the forward of LEA results to address generation with no delay. This can lead to some delays when the source for the LEA is computed by something (like a MOV or ADD) which executes at EX1.
-The following code sample illustrates this condition:
MOV EAX, 1
LEA EAX, [EAX+EBP+0x8000]
Frequency: Common in most client code
Costs: Use of the LEA instruction will cost 3 cycles when immediately succeeding an operation that executes in EX1
SOLUTION: By far, the simplest solution is to use the appropriate compiler option when building for first-generation Atom (Bonnell microarchitecture) targets. For the Intel Compiler, use the -xSSE3_ATOM switch. For GCC, use –march=atom and –mtune=atom. If hand coding assembly, try to avoid B2B load/store/LEA operations with address dependencies by interleaving other instructions in the 3 cycle AGEN stall bubble.
RELATED_SOURCES: Intel® 64 and IA-32 Architectures Optimization Reference Manual
NOTES: This issue is a code-generation problem. Attempting to fix AGEN stalls by hand is probably not a good use of your time unless they are in extremely hot hand-coded assembly.

1 сообщение / 0 новое
Пожалуйста, обратитесь к странице Уведомление об оптимизации для более подробной информации относительно производительности и оптимизации в программных продуктах компании Intel.