Published:06/23/2017 Last Updated:06/23/2017
This article describes internal driver optimizations for developers using Intel Atom® processors, Intel® Celeron® processors, and Intel® Pentium® Processors with Intel® HD Graphics 500 or 505. The intent is to clarify existing documentation. The optimizations described are completely transparent. The only change needed from a developer perspective is to be aware that for this special case, applications should be designed for the thread pool configuration instead of for the underlying hardware.
For Intel Core and Intel® Xeon® processors with integrated graphics, the number of execution units (EUs) and EUs per subslice is large enough that mapping thread pools directly to subslices is efficient. Tying thread pool implementation to hardware means that application behavior and hardware details can be described together in a way that is easy to visualize and remember. This approach was used by many reference documents such as The Compute Architecture of Intel Processor Graphics Gen9.
However, for the relatively smaller GPUs in the embedded processors listed above, this approach could sometimes result in non-optimal mapping. For these processors, EUs are now pooled across subslices creating "virtual subslices" which do not match the hardware. In this case, it can help to understand where behavior is driven by thread pools instead of hardware layout.
The thread pools determine how you should write your application, not the physical hardware. For example, if you have Intel HD Graphics 505, your application should be written as if there were two subslices with 9 EUs, not three subslices with six EUs.
Extensive testing proved that the worst case was to match legacy configuration performance. The performance boost from switching to 2x9/1x12 often approaches 2X. Since no scenarios were found that benefit from the legacy configuration, there are no plans to add extensions to modify MEDIA_POOL_STATE.
There are 4 main areas to consider:
In the past, thread pools were always configured to match physical hardware. Now there is a notable exception due to optimizations increasing performance for Intel HD Graphics 500 and 505 GPUs. You won't need to make a lot of changes to use these optimizations. The most important takeaway is that Intel completed work behind the scenes to make efficient use of EUs easy across the full range of GPU options. These details are provided as a conceptual background, but everything happens under the hood. These changes are completely transparent. To your application, Intel HD Graphics 500 has 1 subslice with 12 EUs and Intel HD Graphics 505 has 2 subslices with 9 EUs -- even though the underlying hardware is 2x6 and 3x6. Extensive internal testing has shown that this internal driver optimization provides significant improvements. We have not seen a case of performance regression yet. However, we are always open to feedback. If you find a scenario where the legacy thread pool configuration may be a better fit, please let us know.
For more information, see: Broxton Graphics Programmer's Reference Manual.
Performance varies by use, configuration and other factors. Learn more at www.Intel.com/PerformanceIndex.