Disclosure of H/W prefetcher control on some Intel processors

Disclosure of H/W prefetchers control on some Intel processors

This article discloses the MSR setting that can be used to control the various h/w prefetchers that are available on Intel processors based on the following microarchitectures: Nehalem, Westmere, Sandy Bridge, Ivy Bridge, Haswell, and Broadwell.

The above mentioned processors support 4 types of h/w prefetchers for prefetching data. There are 2 prefetchers associated with L1-data cache (also known as DCU) and 2 prefetchers associated with L2 cache. There is a Model Specific Register (MSR) on every core with address of 0x1A4 that can be used to control these 4 prefetchers. Bits 0-3 in this register can be used to either enable or disable these prefetchers. Other bits of this MSR are reserved.

Prefetcher

Bit# in MSR 0x1A4

Description

L2 hardware prefetcher  

0

Fetches additional lines of code or data into the L2 cache

L2 adjacent cache line prefetcher

1

Fetches the cache line that comprises a cache line pair (128 bytes)

DCU prefetcher

2

Fetches the next cache line into L1-D cache

DCU IP prefetcher

3

Uses sequential load history (based on Instruction Pointer of previous loads) to determine whether to prefetch additional lines

If any of the above bits are set to 1 on a core, then that particular prefetcher on that core is disabled. Clearing that bit (setting it to 0) will enable the corresponding prefetcher. Please note that this MSR is present in every core and changes made to the MSR of a core will impact the prefetchers only in that core. If hyper-threading is enabled, both the threads share the same MSR.

Most BIOS implementations are likely to leave all the prefetchers enabled (i.e MSR 0x1A4 value at 0) as prefetchers are either neutral or positively impact the performance for a large number of applications. However, how these prefetchers may impact your application is going to be highly dependent on the data access patterns in your application.

These bits can be enabled or disabled at any time. Any changes will impact the prefetchers (and hence the performance of all the applications) running on all the cores where the changes are applied.

Tools that measure memory latencies and bandwidth may want to explicitly set the prefetchers to a known state for more controlled measurements. They can change the prefetcher settings during measurement but should restore them back to the original state on completion. For example, Intel Memory Latency Checker tool (http://www.intel.com/software/mlc) modifies the prefetchers through writes to MSR 0x1a4 to measure accurate latencies and restores them to the original state on exit.

For more complete information about compiler optimizations, see our Optimization Notice.

5 comments

Top
kadir's picture

How can I programmatically disable hardware prefetcher on Xeon Phi? Or is this achievable via compiler flags?

Sho K.'s picture

Vish, are there any controls for throttling the prefetches? Especially the L2 hardware prefetcher?

Also, do the L2 prefetchers do both instruction and data prefetching?

Vish Viswanathan (Intel)'s picture

These are MSR and you need kernel privileges to execute RDMSR/WRMSR instructions. You can also use /dev/cpu/<cou#>/msr to read and write any MSR

Intel C.'s picture

>>These bits can be enabled or disabled at any time.

Would you mind explaining how that can be done? via kernel? where exactly?

Add a Comment

Have a technical question? Visit our forums. Have site or software product issues? Contact support.