# Optimize Shader Execution

Metric Name | Description |

EU Array / Pipes: EU FPU0 Pipe Active | Percentage of time the Floating Point Unit (FPU) pipe is actively executing instructions. |

EU Array / Pipes: EU FPU1 Pipe Active | Percentage of time the Extended Math (EM) pipe is active executing instructions. |

## Ingredients

- Application:Microsoft D3D12Multithreading sample: https://github.com/microsoft/DirectX-Graphics-Samples/tree/master/Samples/Desktop/D3D12Multithreading
- Tool:Intel® GPAGraphics Frame Analyzer

- Operating System:Windows* 10
- GPU:Intel® Processor Graphics Gen9 and higher
- API:DirectX* 11/12

## Define Code Portions to Optimize

- Open the event with the discovered Shader Execution bottleneck in theGraphics Frame AnalyzerResource Viewerby selecting this eventon theMainbar chart.
- SelectShaderin theResource Listto open the shader source.
- Analyze the shader source to understand the algorithm and find potential places for optimization.Pixel shader invokesCalcLightingColorfor each light (NUM_LIGHTS=3), the first light also computes shadow by theCalcUnshadowedAmountPCF2x2function.CalcLightingColorfunction is called three times, other functions are called only once per shader invocation. SoCalcLightingColoris potentially the primary place for optimization.float4 CalcLightingColor(float3 vLightPos, float3 vLightDir, float4 vLightColor, float4 vFalloffs, float3 vPosWorld, float3 vPerPixelNormal) { float3 vLightToPixelUnNormalized = vPosWorld - vLightPos; // Dist falloff = 0 at vFalloffs.x, 1 at vFalloffs.x - vFalloffs.y float fDist = length(vLightToPixelUnNormalized); float fDistFalloff = saturate((vFalloffs.x - fDist) / vFalloffs.y); // Normalize from here on. float3 vLightToPixelNormalized = vLightToPixelUnNormalized / fDist; // Angle falloff = 0 at vFalloffs.z, 1 at vFalloffs.z - vFalloffs.w float fCosAngle = dot(vLightToPixelNormalized, vLightDir / length(vLightDir)); float fAngleFalloff = saturate((fCosAngle - vFalloffs.z) / vFalloffs.w); // Diffuse contribution. float fNDotL = saturate(-dot(vLightToPixelNormalized, vPerPixelNormal)); return vLightColor * fNDotL * fDistFalloff * fAngleFalloff; }float4 CalcUnshadowedAmountPCF2x2(int lightIndex, float4 vPosWorld) { // Compute pixel position in light space. float4 vLightSpacePos = vPosWorld; vLightSpacePos = mul(vLightSpacePos, lights[lightIndex].view); vLightSpacePos = mul(vLightSpacePos, lights[lightIndex].projection); vLightSpacePos.xyz /= vLightSpacePos.w; // Translate from homogeneous coords to texture coords. float2 vShadowTexCoord = 0.5f * vLightSpacePos.xy + 0.5f; vShadowTexCoord.y = 1.0f - vShadowTexCoord.y; // Depth bias to avoid pixel self-shadowing. float vLightSpaceDepth = vLightSpacePos.z - SHADOW_DEPTH_BIAS; // Find sub-pixel weights. float2 vShadowMapDims = float2(1280.0f, 720.0f); // need to keep in sync with .cpp file float4 vSubPixelCoords = float4(1.0f, 1.0f, 1.0f, 1.0f); vSubPixelCoords.xy = frac(vShadowMapDims * vShadowTexCoord); vSubPixelCoords.zw = 1.0f - vSubPixelCoords.xy; float4 vBilinearWeights = vSubPixelCoords.zxzx * vSubPixelCoords.wwyy; // 2x2 percentage closer filtering. float2 vTexelUnits = 1.0f / vShadowMapDims; float4 vShadowDepths; vShadowDepths.x = shadowMap.Sample(sampleClamp, vShadowTexCoord); vShadowDepths.y = shadowMap.Sample(sampleClamp, vShadowTexCoord + float2(vTexelUnits.x, 0.0f)); vShadowDepths.z = shadowMap.Sample(sampleClamp, vShadowTexCoord + float2(0.0f, vTexelUnits.y)); vShadowDepths.w = shadowMap.Sample(sampleClamp, vShadowTexCoord + vTexelUnits); // What weighted fraction of the 4 samples are nearer to the light than this pixel? float4 vShadowTests = (vShadowDepths >= vLightSpaceDepth) ? 1.0f : 0.0f; return dot(vBilinearWeights, vShadowTests); }float4 PSMain(PSInput input) : SV_TARGET { float4 diffuseColor = diffuseMap.Sample(sampleWrap, input.uv); float3 pixelNormal = CalcPerPixelNormal(input.uv, input.normal, input.tangent); float4 totalLight = ambientColor; for (int i = 0; i < NUM_LIGHTS; i++) { float4 lightPass = CalcLightingColor(lights[i].position, lights[i].direction, lights[i].color, lights[i].falloff, input.worldpos.xyz, pixelNormal); if (sampleShadowMap && i == 0) { lightPass *= CalcUnshadowedAmountPCF2x2(i, input.worldpos); } totalLight += lightPass; } return diffuseColor * saturate(totalLight); }
- Select the ISA type in theShader Codedrop-down list to analyze the GEN Assembly.

- math.sqt – 3 instructions
- math.rsqt – 9 instructions
- math.inv -– 7 instructions

## Perform Optimization

- Eliminate the constant condition to remove the flow control.

166 if (sampleShadowMap && i == 0) 167 { 168 lightPass *= CalcUnshadowedAmountPCF2x2(i, input.worldpos); 169 }

- Reduce the number of complex math and floating point instructions.

58 vVertNormal = normalize(vVertNormal); 59 vVertTangent = normalize(vVertTangent); 61 float3 vVertBinormal = normalize(cross(vVertTangent, vVertNormal)); 87 float fCosAngle = dot(vLightToPixelNormalized, vLightDir / length(vLightDir));

81 float fDistFalloff = saturate((vFalloffs.x - fDist) / vFalloffs.y); 88 float fAngleFalloff = saturate((fCosAngle - vFalloffs.z) / vFalloffs.w);

103 vLightSpacePos = mul(vLightSpacePos, lights[lightIndex].view); 104 vLightSpacePos = mul(vLightSpacePos, lights[lightIndex].projection);

- Remove redundant function calls.