Floating point assists also known as FP_ASSISTS. These are caused by primarily two reasons: denormals and underflow numbers which are performance concern on most of the architectures.
Assists usually involve the microcode sequencer that helps handle the assist. Typically when MS assistance is needed, many micro-ops are provided out of MS which can be a perf issue as machine wil be executing all these extra uops. All of these uops will show as retiring, so it is essential to track when we have high uop retiring rate that assists are not involved in here. Determining the number of cycles where microcode is generated from the microcode sequencer is often a good methodology to determine the cost of the assist.
Estimating the cost of assists using microcode sequencer cycles: (From IA32/64 optimization guide – Ahmad Yasin and Mike Chynoweth)
As shown in the optimization guide, the cost is percentage of the ratio between IDQ.MS_CYCLES and CPU_CLK_UNHALTED.THREAD events
Floating point assists:
From the Sandy Bridge microarchitecture has fewer cases of assists than previous architectures. However there are still cases which require an FP assist. For example, denormal inputs for X87 instructions require an FP assist, potentially costing hundreds of cycles.
Cost of FP assist is calculated by percent ratio of FP_ASSISTS.ANY divided by INST_RETIRTED.ANY events
For gaining performance out of apps facing with such issues, it is critical to identify and fix these.
How to fix:
Use the SSE and SSE2 instructions to set Flush-to-Zero and Denormals-Are-Zero modes within the hardware to enable highest performance for floating-point applications.
For more information see: