What you Need to Know about Prefetching

You may have heard that most current processors, including the Intel® Core™ i7 and Xeon® 5500 series, support prefetching. This blog will briefly cover the basics of what that means and how it affects your performance analysis.

What is Prefetching?
Prefetching in general means bringing data or instructions from memory into the cache before they are needed. When your application needs data that was prefetched, instead of waiting for the data from memory, it can grab it from cache and keep right on executing. There are actually two main ways in which prefetching can occur: initiated by hardware or initiated by software.

Hardware prefetching is implemented by your processor and will be different depending on which processor you use. Most recent Intel processors have several different hardware prefetchers. The Core™ i7 processor and Xeon® 5500 series processors, for example, have some prefetchers that bring data into the L1 cache and some that bring data into the L2. There are also different algorithms – some monitor data access patterns for a particular cache and then try to predict what addresses will be needed in the future. Others use simpler algorithms, such as fetching 2 adjacent cache lines. The pattern matching and detection algorithms used by the set of hardware prefetchers on the Core i7 and Xeon 5500 is improved from our last generation, and we continue to optimize these algorithms with each new processor architecture.

Software prefetching is implemented by software developers. It involves identifying when your application will need a particular set of data, then using special prefetch instructions to tell the processor to get this data in advance. For more information on how to use these instructions, and how to get the most out of hardware prefetching, see the Intel® 64 and IA-32 Architectures Optimization Reference Manual.

What Does it Mean for Performance?
For performance tuning, there are several aspects of prefetching to consider. The first thing to understand is how prefetching is affecting the performance data you gather. On most Intel processors, for example, you can monitor cache misses using a product like Intel® VTune™ Performance Analyzer or Intel® PTU. When selecting events to monitor, read the documentation to understand how prefetching affects that event. For example, let’s look at the MEM_LOAD_RETIRED.LLC_MISS on the Core i7 and Xeon 5500 Series processors. This event counts L3 cache misses triggered by loads only, meaning it will not include misses triggered by either software or hardware prefetches. Other events can be programmed to filter out requests by the L2 hardware prefetchers, but not the L1 prefetchers. There are also specific events, such as the LOAD_HIT_PRE event on the Core i7 and Xeon 5500 Series processors, that are useful for tuning software prefetches. LOAD_HIT_PRE tells you when an application tried to load data from an address for which a software prefetch was already in progress – in other words, it tells you when your software prefetch instruction was too near to the actual load to be effective.

After understanding which events include prefetching and what can be monitored, you can develop an understanding of how well your prefetches are working. In the case of Core i7 and Xeon 5500 Series processors, if hardware prefetches are ineffective (meaning they grab the wrong data), it doesn’t help but likely won’t hurt performance either. On some older processors, under very bandwidth-sensitive workloads, ineffective hardware prefetches can hurt the system by actually using bandwidth that is needed by your application.

If you are a developer, you can try to reduce your application’s cache misses by adding or adjusting software prefetches, although this approach should be used with caution. Always remember that performance will change based on the platform and the other applications running – so, under different situations your software prefetches could be more or less effective, and possibly even hurt performance. As a developer you can in some cases adjust the stride of your data access so that it is more hardware prefetcher-friendly, thus helping the hardware prefetchers on your system be more effective. If you are a system or database administrator, you can experiment with turning the hardware prefetchers on or off. Your platform’s BIOS will often give you control over some of the hardware prefetchers (usually not all).

The Bottom Line
Hardware prefetching can be a help to many workloads, and at least with the current generation of processors rarely hurts performance. Software prefetching is a more targeted form of prefetch that also generally helps if done correctly. Prefetching is one of the many performance features of Intel’s processors that we strive to make better and better with each new architecture.

For more complete information about compiler optimizations, see our Optimization Notice.