| Last Modified On : | May 7, 2008 2:42 AM PDT |
Rate |
|
Determine the effectiveness of software-controlled versus hardware-controlled data prefetch for memory optimization. The Pentium® 4 processor has two mechanisms for data prefetch: software-controlled prefetch and an automatic hardware prefetch.
Software-controlled prefetch is enabled using the four prefetch instructions introduced with Streaming SIMD Extensions (SSE) instructions. These instructions are hints to bring a cache line of data in to various levels and modes in the cache hierarchy.
The automatic hardware prefetch can bring lines into the unified first-level cache based on prior data misses. The automatic hardware prefetcher will attempt to prefetch two cache lines ahead of the prefetch stream. This feature was introduced with the Pentium 4 processor.
Generally, prefer software-controlled prefetch in situations where all the following are true: irregular access patterns are present, short arrays must be prefetched, and making changes to existing application code is acceptable. In practice, the individual advantages and disadvantages of hardware and software prefetching must be weighed against the needs of an individual situation.
The software-controlled prefetch is not intended for prefetching code. Using it can incur significant penalties on a multiprocessor system when code is shared.
Software prefetching has the following characteristics:
There are different strengths and weaknesses to software and hardware prefetching on the Pentium 4 processor. The characteristics of the hardware prefetching are as follows (compare with the software prefetching features listed above):
Intel® 64 and IA-32 Architectures Optimization Reference Manual (PDF)
