I am preparing a short presentation on sandy bridge's cache architecture. My primary reference is the "Intel(R) 64 and IA-32 Architectures Optimization Reference Manual, April 2012" and there I found the following note on the L1 DCache prefetchers (220.127.116.11 Data Prefetching):
Two hardware prefetchers load data to the L1 DCache:
• Data cache unit (DCU) prefetcher. This prefetcher, also known as the
streaming prefetcher, is triggered by an ascending access to very recently loaded
data. The processor assumes that this access is part of a streaming algorithm
and automatically fetches the next line.
• Instruction pointer (IP)-based stride prefetcher. This prefetcher keeps
track of individual load instructions. If a load instruction is detected to have a
regular stride, then a prefetch is sent to the next address which is the sum of the
current address and the stride. This prefetcher can prefetch forward or backward
and can detect strides of up to 2K bytes.
The question I have now is from where these hardware prefetchers read the data - is simply an access to L2 or can they bypass lower levels such as L2 or even LLC?