Cookbook

  • 2021
  • 03/31/2021
  • Public Content

Optimize Sampler

Content expert: Anna Sakharova
Sampling is the process of fetching a value from a texture at a given position. You can configure multiple sampling parameters, such as filtering mode, to balance visual results and sampling performance.
Intel® GPA
Graphics Frame Analyzer
checks the difference between the percentage of time when a Sampler Input is available and the percentage of time when a Sampler Output is ready.
Metric Name
Description
GPU / Sampler : Slice <N> Subslice<M> Sampler Input Available
Percentage of time there is input from the EUs on slice ‘N’ and subslice ‘M’ to the sampler.
GPU / Sampler : Slice <N> Subslice<M> Sampler Output Ready
Percentage of time there is output from the sampler to EUs on slice ‘N’ and subslice ‘M’.
When Input Available is >10 percent greater than Output Ready for a subslice of a given slice, the sampler is not returning data back to the EUs as fast as it is being requested. The sampler is probably the hotspot. This comparison only indicates a primary hotspot when the samplers are relatively busy, which means that both EU Occupancy and EU Stall are relatively high.

Ingredients

To optimize a Sampler bottleneck, you need the following:
  • Application
    :
    Unreal Engine 4* Sun Template sample, DirectX SDK* CascadedShadowMaps11 sample
  • Tool
    :
    Intel® GPA
    Graphics Frame Analyzer
    To download a free copy of the
    Intel® Graphics Performance Analyzers
    toolkit, visit the
    Intel® GPA
    product page
    .
  • Operating System
    :
    Windows* 10
  • GPU
    :
    Intel® Processor Graphics
    Gen9 and higher
  • API
    :
    DirectX* 11

Optimize Sampler Bottleneck with
Graphics Frame Analyzer

There can be multiple reasons for the sampler to be a hotspot. To speed up the sampler, you can try the following:
  • Reduce the texture size.
  • Change a filtering mode.
  • Choose a texture format with a smaller amount of data for a pixel or an uncompressed texture format, if possible. In some cases, the uncompressed format may cause a new bottleneck for larger textures.
  • Reduce the number of surfaces on the screen where the texture is applied.
  • Adjust the sampling access pattern to make an access to the texture more linear.
With
Intel® GPA
Graphics Frame Analyzer
you can optimize the Sampler bottleneck with real-time experiments, such as changing texture size and filter parameters in a pixel shader.

Reduce Texture Size

To reduce the texture size, do the following:
  1. Open the event with the discovered Sampler bottleneck in the
    Graphics Frame Analyzer
    Resource Viewer by selecting this event on the
    Main
    bar chart.
  2. Click the
    Show All Resources
    button, and then click the
    Textures
    tab to open the list of sampled textures.
  3. Reduce the size of one or more large textures. For example, the marble texture size is 1024x1024 pixels. Select a smaller size, for example 256x256, and then click the button.
  4. Compare the original and the resulting textures:
    Original:
    Result:
    Difference:
The textures before and after changing the size look quite similar, but the Sampler metric in the
3D Pipeline
tab is now green. The execution time is improved by 18% for selection segments and by 4% overall.

Change Filter Parameters in Pixel Shader

Percentage-Closer Filtering (PCF) may often affect the graphics application performance, that is why the described experiment with changing filter parameters uses the PCF as an example to optimize the Sampler bottleneck.
Percentage-Closer Filtering can be used to render antialiased shadows and soft shadows. For more information on the PCF, see https://docs.microsoft.com/en-us/windows/win32/dxtecharts/cascaded-shadow-maps.
To change filter parameters, do the following:
  1. Open the event with the discovered Sampler bottleneck in the
    Graphics Frame Analyzer
    Resource Viewer by selecting this event on the
    Main
    bar chart.
    The pink segment contains the texture and shadow rendering. Shadow properties are set in the pixel shader.
  2. Select the Shader resource in the
    Resource List
    , and then choose the
    Pixel
    shader type. The pixel shader contains the
    CalculatePCFPercentLit
    method with m1 and m2 values, which represent the iteration range in the filter loop.
    m1 and m2 formulas:
    m1 = m_iPCFBlurSize / -2
    m2 = m_iPCFBlurSize / 2 + 1
    ,
    where
    m_iPCFBlurSize
    is the kernel size. The initial kernel size is 9, m1 = -4, and m2 = 5.
  3. Reduce the kernel size to 3, set m1 to -1 and m2 to 2.
    The metrics values are improved, but the Sampler is still a bottleneck.
  4. Check the extreme condition by setting the kernel size to 1, m1 to 0, and m2 to 1.
The Sampler is underlined green now. The execution time is improved by 8% overall and by 89% for the selection segment.
Compare the original and the resulting textures:
Original:
Result:
Difference:

See Also

https://docs.microsoft.com/en-us/windows/win32/api/dxgiformat/ne-dxgiformat-dxgi_format
https://docs.microsoft.com/en-us/windows/win32/dxtecharts/cascaded-shadow-maps

Product and Performance Information

1

Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.